This is a static-scan check (robots.txt + llms.txt + schema.org + headers). Live engine probes across ChatGPT, Claude, Gemini, and Perplexity arrive in a future build — currently in queue. Real visibility lives in category and comparison queries, which we measure with a 100-prompt stratified set on the Audit tier.
Same scan, free, no signup. Results in ~5 seconds at your own permanent canaifind.com/r/{slug} URL.
AI crawler robots.txt audit
§1 of 4No robots.txt found — anthropic.com does not serve a robots.txt file. Every AI crawler is therefore allowed by default. This is not a problem per se, but there is no way to opt out of training corpora (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended).
| GPTBot | Training crawler for future OpenAI models. | ✓ Allowed |
| OAI-SearchBot | ChatGPT Search index. Disallowing makes you invisible to ChatGPT Search. | ✓ Allowed |
| ChatGPT-User | User-initiated retrieval. Ignores robots.txt by design. | — Ignores robots.txt |
| ClaudeBot | Training crawler for Anthropic models. | ✓ Allowed |
| Claude-User | Retrieves pages when a Claude user asks about them. Respects robots.txt (unlike OpenAI's ChatGPT-User). | ✓ Allowed |
| Claude-SearchBot | Search index for Claude. Disallowing reduces Claude search quality. | ✓ Allowed |
| claude-code | Claude Code CLI / IDE retrieval. Documentation-targeted. | ✓ Allowed |
| PerplexityBot | Perplexity indexing. Disallowing removes you from Perplexity retrieval. | ✓ Allowed |
| Perplexity-User | User-initiated retrieval. Ignores robots.txt by design. | — Ignores robots.txt |
| Google-Extended | Training opt-out for Gemini / Bard. Disallowing opts you out of Google AI training. | ✓ Allowed |
| GoogleOther | Catch-all for non-Search Google crawlers. | ✓ Allowed |
| Meta-ExternalAgent | Meta AI crawler. Disallowing opts you out of Meta AI training/retrieval. | ✓ Allowed |
| Applebot-Extended | Apple Intelligence training opt-out (separate from Applebot Search). | ✓ Allowed |
| Bytespider | ByteDance / TikTok AI crawler. | ✓ Allowed |
| CCBot | Common Crawl. Heavily used as a training-corpus source by every major model. | ✓ Allowed |
Structured data & discovery files
§2 of 4| Artifact | Status | Note |
|---|---|---|
| llms.txtA markdown index of the site's most important pages, served at /llms.txt. Anthropic Claude Desktop and Claude.ai fetch this. IDE tooling (Cursor, Claude Code, GitHub Copilot, Cline, Aider) routinely retrieves it. Google has explicitly confirmed it does NOT support it (Gary Illyes, July 2025). OpenAI is unconfirmed. | ✗ Missing | Anthropic Claude respects this; Google has confirmed it does not; OpenAI is unconfirmed. |
| llms-full.txtOptional full-content companion to llms.txt. Useful for agents with large context windows that prefer a single fetch over crawling. Doesn't replace llms.txt — both can coexist. | ✗ Missing | Optional full-content companion file. |
| Artifact | Status | Note |
|---|---|---|
| schema.org OrganizationThe brand-identity anchor LLMs use to disambiguate the site. Without it, profile links on LinkedIn, Wikidata, Crunchbase, GitHub etc. aren't bound to the homepage's entity in the AI's knowledge graph. The sameAs array is the load-bearing field. | ✗ Missing | Entity anchor for the sameAs graph. |
| schema.org FAQPagePages with FAQPage JSON-LD show 2.7× citation rate vs without — 41% vs 15% in the Relixir 2025 study. The JSON-LD must mirror visible Q&A content on the page; Google penalises mismatch. Single highest-leverage fix in the audit. | ✗ Missing | 2.7× citation rate vs without (Relixir 2025) — highest-leverage single fix. |
| schema.org ArticleFor journalistic/editorial pages. Declares author, datePublished, dateModified, and section to AI engines. They preferentially cite recent, dated, authored content in answer-engine results. | ✗ Missing | For editorial pages. |
| schema.org HowToFor step-by-step procedural content. AI engines preferentially cite HowTo markup when answering procedural queries ("how do I X"). Maps directly to retrieval intent. | ✗ Missing | For tutorials. |
| schema.org SoftwareApplicationFor product/app pages. Maps to vendor-evaluation queries ("best X for Y"). Effectively required for B2B SaaS visibility in AI citations — 89% of B2B buyers now use AI for vendor research (Averi 2026). | ✗ Missing | For product pages. |
| Person (author entity)Author entity on bylines, linked to the Article entity. E-E-A-T (Experience, Expertise, Authoritativeness, Trustworthiness) signal — AI engines weight content authored by named, credentialed people higher than anonymous content. | ✗ Missing | E-E-A-T signal on bylines. |
HTTP headers
§3 of 4Could not fetch the homepage (HTTP 0). Skipping HTTP header checks.
Top findings
§4 of 4Single prompt covering all 3 actionable findings, ordered by severity. Paste into Claude Code, Cursor, or any AI dev tool — the agent walks through each fix in sequence, groups changes by file, and reports what it touched.
- 1Med
No llms.txt found.
Anthropic Claude Desktop and Claude.ai respect llms.txt; IDE tooling (Cursor, Claude Code, GitHub Copilot, Cline, Aider) routinely fetches it. Google has confirmed it does NOT support llms.txt; OpenAI is unconfirmed. Ship one if you want Anthropic + dev-tool visibility — do NOT expect it to move ChatGPT or Gemini.
- 2Med
Could not fetch the homepage.
We tried https://anthropic.com/ and got HTTP 0. We cannot audit schema.org markup without the homepage HTML. If anthropic.com is behind auth or geo-blocked, this is expected.
- 3Tip
No robots.txt found
anthropic.com does not serve a robots.txt file. Every AI crawler is therefore allowed by default. This is not a problem per se, but there is no way to opt out of training corpora (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended).
This report has a permanent URL: canaifind.com/r/N3tSkPec. Screenshot, drop in Slack, quote-tweet, or send to whoever's going to ask. That's how this tool finds the next person who needs it.