Reporthackernews.com·checked 2026-05-21 13:17 UTC·methodology v0.1 (preview)·canaifind.com/r/sVfjkywz
PartialSome fundamentals in place; high-leverage gaps identified.

This is a static-scan check (robots.txt + llms.txt + schema.org + headers). Live engine probes across ChatGPT, Claude, Gemini, and Perplexity arrive in a future build — currently in queue. Real visibility lives in category and comparison queries, which we measure with a 100-prompt stratified set on the Audit tier.

╴ Check your own domain

Same scan, free, no signup. Results in ~5 seconds at your own permanent canaifind.com/r/{slug} URL.

AI crawler robots.txt audit

§1 of 4

No robots.txt foundYour site does not serve a robots.txt file. Every AI crawler is therefore allowed by default. This is not a problem per se, but you have no way to opt out of training corpora (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended) if you want to.

OpenAI
GPTBotTraining crawler for future OpenAI models.✓ Allowed
OAI-SearchBotChatGPT Search index. Disallowing makes you invisible to ChatGPT Search.✓ Allowed
ChatGPT-UserUser-initiated retrieval. Ignores robots.txt by design.— Ignores robots.txt
Anthropic
ClaudeBotTraining crawler for Anthropic models.✓ Allowed
Claude-UserRetrieves pages when a Claude user asks about them. Respects robots.txt (unlike OpenAI's ChatGPT-User).✓ Allowed
Claude-SearchBotSearch index for Claude. Disallowing reduces Claude search quality.✓ Allowed
claude-codeClaude Code CLI / IDE retrieval. Documentation-targeted.✓ Allowed
Perplexity
PerplexityBotPerplexity indexing. Disallowing removes you from Perplexity retrieval.✓ Allowed
Perplexity-UserUser-initiated retrieval. Ignores robots.txt by design.— Ignores robots.txt
Google
Google-ExtendedTraining opt-out for Gemini / Bard. Disallowing opts you out of Google AI training.✓ Allowed
GoogleOtherCatch-all for non-Search Google crawlers.✓ Allowed
Meta
Meta-ExternalAgentMeta AI crawler. Disallowing opts you out of Meta AI training/retrieval.✓ Allowed
Apple
Applebot-ExtendedApple Intelligence training opt-out (separate from Applebot Search).✓ Allowed
ByteDance
BytespiderByteDance / TikTok AI crawler.✓ Allowed
Common Crawl
CCBotCommon Crawl. Heavily used as a training-corpus source by every major model.✓ Allowed

Structured data & discovery files

§2 of 4
ArtifactStatusNote
llms.txt✗ MissingAnthropic Claude respects this; Google has confirmed it does not; OpenAI is unconfirmed.
llms-full.txt✗ MissingOptional full-content companion file.
ArtifactStatusNote
schema.org Organization✗ MissingEntity anchor for the sameAs graph.
schema.org FAQPage✗ Missing2.7× citation rate vs without (Relixir 2025) — highest-leverage single fix.
schema.org Article✗ MissingFor editorial pages.
schema.org HowTo✗ MissingFor tutorials.
schema.org SoftwareApplication✗ MissingFor product pages.
Person (author entity)✗ MissingE-E-A-T signal on bylines.

HTTP headers

§3 of 4

Could not fetch the homepage (HTTP 0). Skipping HTTP header checks.

Top findings

§4 of 4
  1. 1

    No llms.txt found.

    Anthropic Claude Desktop and Claude.ai respect llms.txt; IDE tooling (Cursor, Claude Code, GitHub Copilot, Cline, Aider) routinely fetches it. Google has confirmed it does NOT support llms.txt; OpenAI is unconfirmed. Ship one if you want Anthropic + dev-tool visibility — do NOT expect it to move ChatGPT or Gemini.

    Med
  2. 2

    Could not fetch the homepage.

    We tried https://hackernews.com/ and got HTTP 0. We cannot audit schema.org markup without the homepage HTML. If the site is behind auth or geo-blocked, this is expected.

    Med
  3. 3

    No robots.txt found

    Your site does not serve a robots.txt file. Every AI crawler is therefore allowed by default. This is not a problem per se, but you have no way to opt out of training corpora (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended) if you want to.

    Tip
╴ Share this report

This report has a permanent URL: canaifind.com/r/sVfjkywz. Screenshot, drop in Slack, quote-tweet, or send to whoever's going to ask. That's how this tool finds the next person who needs it.