The single most common reason a B2B SaaS doesn't appear in AI answers is the simplest one: AI crawlers can't read the site. Sometimes that's a wildcard block in robots.txt. Sometimes it's a JavaScript-only render. Sometimes it's a broken sitemap. This guide fixes all three.
What does "crawlable for AI" actually mean?
For a page to be cited by an AI tool, three things need to be true. The bot has to be allowed in via robots.txt. It has to be able to find the page (usually via a sitemap or internal link). And the page has to return real HTML the bot can parse without executing JavaScript.
Which AI bots should I let in?
For most B2B SaaS, you want all of the major commercial AI bots allowed. Blocking them is the equivalent of taking your phone off the hook. Here's the current shortlist:
How do I configure robots.txt for AI?
Place a robots.txt file at the root of your domain. Use explicit Allow blocks for each AI bot you want to permit, and end with a wildcard. Reference your sitemap. Re-check after every site rebuild - it's surprisingly common for hosting providers or CMS upgrades to silently overwrite robots.txt.
User-agent: GPTBot
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Perplexity-User
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: *
Allow: /
Sitemap: https://yourdomain.com/sitemap.xmlShould I block any AI bots?
Most B2B SaaS shouldn't. The only common reason to block specific bots is licensed or paywalled content you don't want used for training. If that doesn't describe your site, the cost-benefit of blocking is firmly against you - you trade visibility for nothing.
Do I need a sitemap for AI?
Yes. A clean, accurate sitemap.xml is still the best way to tell crawlers what exists on your site. AI bots use it just like Google does. Auto-generate it from your routes so it stays in sync, and reference it from robots.txt with a Sitemap:directive.
Common sitemap mistakes that hurt AI visibility:
- Including no-index or canonicalised URLs. Only list pages you want cited.
- Letting it go stale. Auto-generate, don't hand-maintain.
- Splitting into multiple sitemaps without a sitemap index file.
Does my site need to be server-rendered?
For any page you want AI to cite: yes, or close to it. Most AI bots don't execute JavaScript the way a modern browser does. If your homepage, pricing page or guides only render content after a client-side JS bundle runs, the bot sees an empty shell.
Quick test: open your page, view source (Ctrl/Cmd + U), and search for a sentence from your hero section. If you can't find it in the raw HTML, AI crawlers can't either. Fix this with SSR, static generation or pre-rendering.
How do I check which AI bots have actually visited?
Server access logs are the source of truth. Filter for user agents containing "GPTBot", "ClaudeBot" or "PerplexityBot". If you see no hits over a 30-day window, something is blocking them - usually robots.txt, a CDN rule, or a WAF.
Cloudflare, in particular, ships a "block AI bots" toggle that's off-by-default in newer setups but on-by-default in some older accounts. Check.
A 10-minute crawlability checklist
- Visit
yourdomain.com/robots.txt. Confirm GPTBot, ClaudeBot, PerplexityBot and Google-Extended are allowed. - Visit
yourdomain.com/sitemap.xml. Confirm it returns a valid XML document with your important URLs. - View source on your homepage. Confirm your hero copy appears in the raw HTML.
- Check Cloudflare or your WAF for any AI bot blocking rules.
- Look at server logs for AI user agents in the last 30 days. Zero hits = something's blocking.
