Why your website isn't showing up on ChatGPT. And how to fix it.
AI engines answer millions of buying questions a day. If ChatGPT, Perplexity, and Google's AI Overviews never mention you, the cause is almost always one of five fixable problems.
- ChatGPT answers come from two places: training data and live search retrieval. You can influence both this quarter.
- The five usual causes: blocked AI crawlers, no extractable answers, weak entity identity, stale pages, and zero third-party footprint.
- A robots.txt block on GPTBot or OAI-SearchBot makes you invisible no matter how good the content is.
- Pages updated within 30 days earn roughly 3x more AI citations than stale ones.
- Most fixes are one afternoon of work; this post gives the exact checklist.
How does ChatGPT decide which websites to mention?
ChatGPT pulls website mentions from two separate systems: its training data (what the model learned about your brand before its knowledge cutoff) and live search retrieval (pages its search crawler fetches in real time when a user asks a question). Showing up requires being accessible and citable in both.
Training data is slow-moving: it rewards brands that appear consistently across the open web, in directories, reviews, and editorial coverage. Live retrieval is fast-moving: when someone asks "best Shopify agency for a beauty brand," ChatGPT's search mode fetches candidate pages through OAI-SearchBot and quotes the ones it can parse and trust.
The same split applies to the other engines. Perplexity retrieves through PerplexityBot, Anthropic's Claude through ClaudeBot and Claude-SearchBot, and Google's AI Overviews use the regular Googlebot index. Every cause below maps to one of those two systems.
Is your robots.txt blocking AI crawlers?
The most common cause of total AI invisibility is a robots.txt rule or firewall that blocks AI crawlers. Cloudflare began default-blocking AI bots on new domains in mid-2025, so many site owners are blocking GPTBot and OAI-SearchBot without knowing it. Blocked retrieval crawlers mean you cannot be cited, period.
There are two crawler families and they do different jobs. Training crawlers feed the next model version; search crawlers feed live answers today. The table below is the minimum allowlist for each engine:
| User agent | Engine | What it feeds |
|---|---|---|
| GPTBot | OpenAI | Model training (future knowledge) |
| OAI-SearchBot | ChatGPT search | Live answer citations |
| ChatGPT-User | ChatGPT browse | User-triggered page fetches |
| PerplexityBot | Perplexity | Live answer citations |
| ClaudeBot / Claude-SearchBot | Anthropic Claude | Training + live search |
| Googlebot | Google AI Overviews | The regular Search index powers AIO |
Test it in two minutes: fetch your homepage with each bot's user-agent string and confirm a 200 response. If you see 403s or a challenge page, check your CDN's bot settings before touching robots.txt - the block usually lives at the edge. Our free robots.txt generator outputs a correct AI-crawler allowlist.
Can an AI engine extract an answer from your pages?
AI engines quote passages, not pages. A page earns citations when it contains self-contained 40-90 word passages that answer a question without needing surrounding context. Analysis of roughly 8,000 ChatGPT citations by Search Engine Land found that about 72% of cited posts open sections with exactly this answer-first format.
The pattern that works: phrase your H2 as the question buyers actually ask, then answer it completely in the first paragraph, then expand. Walls of unstructured text, answer-at-the-bottom blog intros, and content locked inside images or JavaScript widgets all fail extraction.
Structure compounds it: real HTML tables get cited roughly four times more than styled-div lookalikes, and clear H2-H3-list hierarchies lift citation odds by about 40% per Ahrefs and Search Engine Land studies of AI citation behaviour. Visible Q&A blocks help too - the answers are pre-packaged passages.
Does the AI actually know who you are?
AI engines mention brands they can resolve to a clear entity: one name, one description, consistent facts everywhere. If your homepage never states plainly what you are, where you operate, and who you serve, the model has nothing reliable to repeat - so it repeats a competitor instead.
Three fixes, in order of impact. First, write an entity definition: a one-paragraph "X is a..." statement on your homepage and About page that names your category, locations, founding year, and proof points. Second, ship Organization schema with a sameAs array pointing at your LinkedIn, review profiles, and directories, then check it with the schema validator. Third, make your name, address, and phone identical everywhere they appear - footer, contact page, directories.
Generators help if schema is new to you: our schema markup generator builds and validates Organization JSON-LD in the browser.
Are your pages fresh enough to cite?
Freshness is one of the strongest AI-citation signals measured. An Ahrefs study of roughly 17 million ChatGPT citations found 76% of top-cited pages had been updated within the previous 30 days, and undated pages were cited several times less than otherwise-identical dated ones.
Two implications. Put a visible "Published / Last updated" line on every substantive page, backed by datePublished and dateModified in your schema. Then actually refresh: a quarterly pass over money pages - one updated stat, one new paragraph, one fixed link per page - keeps the dates honest. Never bump dates without a real change; engines store content fingerprints and fake freshness reads as manipulation.
Do other sites vouch for you?
AI engines trust third-party sources more than your own website. Benchmark studies put brands at roughly six times more likely to be cited via independent sources - reviews, directories, Reddit threads, YouTube - than via their own domain. If your footprint ends at your homepage, you lose to brands with receipts.
The highest-impact moves: complete review profiles (Trustpilot, Clutch, G2) with steady fresh reviews; genuine participation in the Reddit and forum threads where your buyers ask questions; and YouTube content with transcripts, which one 75,000-brand analysis found to be the single strongest correlate of AI Overview visibility. None of this is gameable in a weekend - which is exactly why it defends.
The 30-minute self-audit.
Run these five checks in order; each maps to one cause. Most teams find their blocker in the first two: a crawler block they never set, or pages with no extractable answers.
| Check | How | Effort | Impact if failing |
|---|---|---|---|
| Crawler access | curl with each bot user agent; check CDN bot settings | 15 min | Total - fixes invisibility outright |
| Ask the engines | Ask ChatGPT and Perplexity "what is {your brand}?" | 5 min | Reveals entity gaps and wrong facts |
| Answer extraction | Read your top page: does any 60-word passage stand alone? | 10 min | High - rewrite leads answer-first |
| Schema + dates | Run key URLs through validator.schema.org | 10 min | Medium - entity + freshness signals |
| Third-party search | Search your brand on Reddit, Trustpilot, YouTube | 10 min | Long-term - the citation moat |
Optionally add an llms.txt file: a plain-text site summary at your root that several AI pipelines now read. Low effort, emerging standard per web platform guidance circles - ours lives at digitalheroesco.com/llms.txt as a working example.
What a full AI-visibility fix looks like.
A complete fix runs in three passes: unblock and verify every AI crawler (week one), restructure money pages into answer-first format with entity schema and visible dates (weeks two to four), then build the third-party footprint that compounds (ongoing). We run this exact program on digitalheroesco.com - every pattern in this post is live on this site.
If you want it done for you, this work sits inside our SEO and AI-search service and pairs with growth strategy for the measurement side. For self-serve teams, start with the free tools - the website audit, robots generator, and schema generator cover the first two passes.
Six answers.
How long until ChatGPT starts mentioning my website?
Live-retrieval mentions can appear within two to six weeks of fixing crawler access and page structure, because search-mode answers re-fetch the web continuously. Training-data mentions move slower - they update when the next model version ships, which is why the third-party footprint matters: it is what future training runs absorb.
Should I block AI crawlers to protect my content instead?
For most service businesses, no. Blocking retrieval crawlers removes you from AI answers your competitors will happily occupy, and blocking training crawlers excludes you from the next model generation's world knowledge. The trade is different for publishers monetizing content directly; for lead-generation sites, visibility is the product.
Does FAQ schema help me show up in AI answers?
The visible Q&A content helps; the FAQPage markup mostly does not. Google restricted FAQ rich results to government and health sites back in August 2023, and AI engines read the rendered question-answer text either way. Write real questions with complete answers in plain HTML and skip the schema type.
Is AI visibility different from normal SEO?
It overlaps heavily - crawlability, structure, and authority matter to both. The differences: AI engines weight passage-level extractability and entity clarity more than rankings, freshness decays faster, and third-party mentions count disproportionately. Google's AI Overviews literally run on the normal Search index, so classic SEO remains the foundation.
Can I just buy my way into AI answers?
Not today. None of the major engines sell organic-answer placement, and the citation patterns measured so far reward structure, freshness, and independent validation. Ads exist around some AI surfaces, but the answer text itself is earned - which is good news if you do the work and bad news if you wanted a shortcut.
How do I measure whether any of this is working?
Run a quarterly prompt panel: ask ChatGPT, Perplexity, Claude, and Gemini the five questions your buyers ask, and log whether you are named and described correctly. Watch referral traffic from chat.openai.com and perplexity.ai in analytics, and track which pages they land on. Expect movement in quarters, not days.
Want the audit done for you?
Thirty minutes, your site on screen, the five checks live - and a written fix list within 48 hours.
Book my 30-minute audit callPublished .