[Technical Foundation]

How to make your SaaS site crawlable for AI

Category: Technical Foundation · Reading time: 8 min

A brass key beside a wooden door, representing access for AI crawlers

The single most common reason a B2B SaaS doesn't appear in AI answers is the simplest one: AI crawlers can't read the site. Sometimes that's a wildcard block in robots.txt. Sometimes it's a JavaScript-only render. Sometimes it's a broken sitemap. This guide fixes all three.

What does "crawlable for AI" actually mean?

For a page to be cited by an AI tool, three things need to be true. The bot has to be allowed in via robots.txt. It has to be able to find the page (usually via a sitemap or internal link). And the page has to return real HTML the bot can parse without executing JavaScript.

Which AI bots should I let in?

For most B2B SaaS, you want all of the major commercial AI bots allowed. Blocking them is the equivalent of taking your phone off the hook. Here's the current shortlist:

Bot
Owner
Purpose
GPTBot
OpenAI
Crawls for ChatGPT search and model training
OAI-SearchBot
OpenAI
Dedicated search index for ChatGPT
ChatGPT-User
OpenAI
Fetches live URLs during ChatGPT browsing sessions
ClaudeBot
Anthropic
Anthropic's general web crawler
anthropic-ai
Anthropic
Legacy Anthropic user agent
PerplexityBot
Perplexity
Indexes web pages for Perplexity answers
Perplexity-User
Perplexity
Live fetch during a Perplexity answer
Google-Extended
Google
Opt-in for Gemini and AI Overviews
Applebot-Extended
Apple
Opt-in for Apple Intelligence

How do I configure robots.txt for AI?

Place a robots.txt file at the root of your domain. Use explicit Allow blocks for each AI bot you want to permit, and end with a wildcard. Reference your sitemap. Re-check after every site rebuild - it's surprisingly common for hosting providers or CMS upgrades to silently overwrite robots.txt.

Example robots.txt
User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

Should I block any AI bots?

Most B2B SaaS shouldn't. The only common reason to block specific bots is licensed or paywalled content you don't want used for training. If that doesn't describe your site, the cost-benefit of blocking is firmly against you - you trade visibility for nothing.

Do I need a sitemap for AI?

Yes. A clean, accurate sitemap.xml is still the best way to tell crawlers what exists on your site. AI bots use it just like Google does. Auto-generate it from your routes so it stays in sync, and reference it from robots.txt with a Sitemap:directive.

Common sitemap mistakes that hurt AI visibility:

  • Including no-index or canonicalised URLs. Only list pages you want cited.
  • Letting it go stale. Auto-generate, don't hand-maintain.
  • Splitting into multiple sitemaps without a sitemap index file.

Does my site need to be server-rendered?

For any page you want AI to cite: yes, or close to it. Most AI bots don't execute JavaScript the way a modern browser does. If your homepage, pricing page or guides only render content after a client-side JS bundle runs, the bot sees an empty shell.

Quick test: open your page, view source (Ctrl/Cmd + U), and search for a sentence from your hero section. If you can't find it in the raw HTML, AI crawlers can't either. Fix this with SSR, static generation or pre-rendering.

How do I check which AI bots have actually visited?

Server access logs are the source of truth. Filter for user agents containing "GPTBot", "ClaudeBot" or "PerplexityBot". If you see no hits over a 30-day window, something is blocking them - usually robots.txt, a CDN rule, or a WAF.

Cloudflare, in particular, ships a "block AI bots" toggle that's off-by-default in newer setups but on-by-default in some older accounts. Check.

A 10-minute crawlability checklist

  1. Visit yourdomain.com/robots.txt. Confirm GPTBot, ClaudeBot, PerplexityBot and Google-Extended are allowed.
  2. Visit yourdomain.com/sitemap.xml. Confirm it returns a valid XML document with your important URLs.
  3. View source on your homepage. Confirm your hero copy appears in the raw HTML.
  4. Check Cloudflare or your WAF for any AI bot blocking rules.
  5. Look at server logs for AI user agents in the last 30 days. Zero hits = something's blocking.

Next: schema markup that actually helps AI cite you.