Can AI find adidas.com?

Reportadidas.com·checked 2026-05-21 20:31 UTC·methodology v0.1 (preview)·canaifind.com/r/y2N8iDqB

PartialSome fundamentals in place; high-leverage gaps identified.

This is a static-scan check (robots.txt + llms.txt + schema.org + headers). Live engine probes across ChatGPT, Claude, Gemini, and Perplexity arrive in a future build — currently in queue. Real visibility lives in category and comparison queries, which we measure with a 100-prompt stratified set on the Audit tier.

╴ Check your own domain

Same scan, free, no signup. Results in ~5 seconds at your own permanent canaifind.com/r/{slug} URL.

AI crawler robots.txt audit

§1 of 4

OpenAI

GPTBot	Training crawler for future OpenAI models.	? Unknown (fetch blocked)
OAI-SearchBot	ChatGPT Search index. Disallowing makes you invisible to ChatGPT Search.	? Unknown (fetch blocked)
ChatGPT-User	User-initiated retrieval. Ignores robots.txt by design.	— Ignores robots.txt

Anthropic

ClaudeBot	Training crawler for Anthropic models.	? Unknown (fetch blocked)
Claude-User	Retrieves pages when a Claude user asks about them. Respects robots.txt (unlike OpenAI's ChatGPT-User).	? Unknown (fetch blocked)
Claude-SearchBot	Search index for Claude. Disallowing reduces Claude search quality.	? Unknown (fetch blocked)
claude-code	Claude Code CLI / IDE retrieval. Documentation-targeted.	? Unknown (fetch blocked)

Perplexity

PerplexityBot	Perplexity indexing. Disallowing removes you from Perplexity retrieval.	? Unknown (fetch blocked)
Perplexity-User	User-initiated retrieval. Ignores robots.txt by design.	— Ignores robots.txt

Google

Google-Extended	Training opt-out for Gemini / Bard. Disallowing opts you out of Google AI training.	? Unknown (fetch blocked)
GoogleOther	Catch-all for non-Search Google crawlers.	? Unknown (fetch blocked)

Structured data & discovery files

§2 of 4

Artifact	Status	Note
llms.txt	✗ Missing	Anthropic Claude respects this; Google has confirmed it does not; OpenAI is unconfirmed.
llms-full.txt	✗ Missing	Optional full-content companion file.

Artifact	Status	Note
schema.org Organization	✗ Missing	Entity anchor for the sameAs graph.
schema.org FAQPage	✗ Missing	2.7× citation rate vs without (Relixir 2025) — highest-leverage single fix.
schema.org Article	✗ Missing	For editorial pages.
schema.org HowTo	✗ Missing	For tutorials.
schema.org SoftwareApplication	✗ Missing	For product pages.
Person (author entity)	✗ Missing	E-E-A-T signal on bylines.

HTTP headers

§3 of 4

Header	Value
X-Robots-Tag	— not set
Cache-Control	max-age=0, no-cache, no-store
Link: canonical	— not set
Content-Type	text/html

Agent-content probe	Status	Note
Markdown negotiation	✗ Returns HTML	No text/markdown response when Accept: text/markdown is sent.
Agent-discovery Link rels	✗ None	No api-catalog / service-desc / describedby / agent-card rels.

Top findings

§4 of 4

╴ Fix everything in one paste

Single prompt covering all 5 actionable findings, ordered by severity. Paste into Claude Code, Cursor, or any AI dev tool — the agent walks through each fix in sequence, groups changes by file, and reports what it touched.

1
robots.txt is behind bot protection.
adidas.com/robots.txt returned 401/403 from Akamai bot mitigation. AI retrieval crawlers that do not execute JavaScript may be blocked from this file too, which means none of them can read the crawler rules. The fact that we cannot see robots.txt IS itself a signal worth knowing.
Med
2
Could not fetch the homepage.
We tried https://adidas.com/ and got HTTP 403 from Akamai bot mitigation. AI retrieval crawlers that don't execute JavaScript (OAI-SearchBot, Claude-SearchBot, PerplexityBot, GoogleOther) likely face the same block. The fact that adidas.com's homepage is invisible to non-browser clients IS itself a finding worth knowing.
Med
3
Cache-Control may prevent retrieval-layer caching.
Aggressive no-store / private no-cache directives tell retrieval crawlers not to trust the response. For public pages you want cited, prefer `Cache-Control: public, max-age=300` or similar.
Med
4
No Link: rel="canonical" HTTP header.
Most CMSs handle canonicalization via `<link rel="canonical">` in HTML. Adding the HTTP header version too is processed by retrieval crawlers that don't fully render HTML. Optional.
Tip
5
Could not probe Markdown negotiation.
The homepage returned 401/403 to our `Accept: text/markdown` probe — typically bot protection. We cannot determine whether the site supports Markdown for Agents.
Tip

╴ Share this report

This report has a permanent URL: canaifind.com/r/y2N8iDqB. Screenshot, drop in Slack, quote-tweet, or send to whoever's going to ask. That's how this tool finds the next person who needs it.

AI crawler robots.txt audit

Structured data & discovery files

HTTP headers

Top findings

robots.txt is behind bot protection.

Could not fetch the homepage.

Cache-Control may prevent retrieval-layer caching.

No Link: rel="canonical" HTTP header.

Could not probe Markdown negotiation.