PartialSome fundamentals in place; high-leverage gaps identified.
This is a static-scan check (robots.txt + llms.txt + schema.org + headers). Live engine probes across ChatGPT, Claude, Gemini, and Perplexity arrive in a future build — currently in queue. Real visibility lives in category and comparison queries, which we measure with a 100-prompt stratified set on the Audit tier.
╴ Check your own domain
Same scan, free, no signup. Results in ~5 seconds at your own permanent canaifind.com/r/{slug} URL.
AI crawler robots.txt audit
§1 of 4
No robots.txt found — Your site does not serve a robots.txt file. Every AI crawler is therefore allowed by default. This is not a problem per se, but you have no way to opt out of training corpora (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended) if you want to.
OpenAI
GPTBot
Training crawler for future OpenAI models.
✓ Allowed
OAI-SearchBot
ChatGPT Search index. Disallowing makes you invisible to ChatGPT Search.
✓ Allowed
ChatGPT-User
User-initiated retrieval. Ignores robots.txt by design.
— Ignores robots.txt
Anthropic
ClaudeBot
Training crawler for Anthropic models.
✓ Allowed
Claude-User
Retrieves pages when a Claude user asks about them. Respects robots.txt (unlike OpenAI's ChatGPT-User).
✓ Allowed
Claude-SearchBot
Search index for Claude. Disallowing reduces Claude search quality.
✓ Allowed
claude-code
Claude Code CLI / IDE retrieval. Documentation-targeted.
✓ Allowed
Perplexity
PerplexityBot
Perplexity indexing. Disallowing removes you from Perplexity retrieval.
✓ Allowed
Perplexity-User
User-initiated retrieval. Ignores robots.txt by design.
— Ignores robots.txt
Google
Google-Extended
Training opt-out for Gemini / Bard. Disallowing opts you out of Google AI training.
✓ Allowed
GoogleOther
Catch-all for non-Search Google crawlers.
✓ Allowed
Meta
Meta-ExternalAgent
Meta AI crawler. Disallowing opts you out of Meta AI training/retrieval.
✓ Allowed
Apple
Applebot-Extended
Apple Intelligence training opt-out (separate from Applebot Search).
✓ Allowed
ByteDance
Bytespider
ByteDance / TikTok AI crawler.
✓ Allowed
Common Crawl
CCBot
Common Crawl. Heavily used as a training-corpus source by every major model.
✓ Allowed
Structured data & discovery files
§2 of 4
Artifact
Status
Note
llms.txt
✗ Missing
Anthropic Claude respects this; Google has confirmed it does not; OpenAI is unconfirmed.
llms-full.txt
✗ Missing
Optional full-content companion file.
Artifact
Status
Note
schema.org Organization
✗ Missing
Entity anchor for the sameAs graph.
schema.org FAQPage
✗ Missing
2.7× citation rate vs without (Relixir 2025) — highest-leverage single fix.
schema.org Article
✗ Missing
For editorial pages.
schema.org HowTo
✗ Missing
For tutorials.
schema.org SoftwareApplication
✗ Missing
For product pages.
Person (author entity)
✗ Missing
E-E-A-T signal on bylines.
HTTP headers
§3 of 4
Could not fetch the homepage (HTTP 0). Skipping HTTP header checks.
Top findings
§4 of 4
1
No llms.txt found.
Anthropic Claude Desktop and Claude.ai respect llms.txt; IDE tooling (Cursor, Claude Code, GitHub Copilot, Cline, Aider) routinely fetches it. Google has confirmed it does NOT support llms.txt; OpenAI is unconfirmed. Ship one if you want Anthropic + dev-tool visibility — do NOT expect it to move ChatGPT or Gemini.
Med
2
Could not fetch the homepage.
We tried https://reddit.com/ and got HTTP 0. We cannot audit schema.org markup without the homepage HTML. If the site is behind auth or geo-blocked, this is expected.
Med
3
No robots.txt found
Your site does not serve a robots.txt file. Every AI crawler is therefore allowed by default. This is not a problem per se, but you have no way to opt out of training corpora (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended) if you want to.
Tip
╴ Share this report
This report has a permanent URL: canaifind.com/r/RjNMHL1t. Screenshot, drop in Slack, quote-tweet, or send to whoever's going to ask. That's how this tool finds the next person who needs it.