[Technical Foundation]

How to run a technical GEO audit on your SaaS site

Category: Technical Foundation · Reading time: 9 min

Before you rewrite a single page or chase a single backlink, audit the plumbing. A technical GEO audit tells you whether AI crawlers can reach your site, parse your content and pull out the facts they need to cite you. Most B2B SaaS sites fail at least one of those three - and no amount of content work fixes a site AI can't read.

Magnifying glass on an open notebook beside a warm desk lamp, representing a technical GEO audit

What is a technical GEO audit?

A technical GEO audit is a structured inspection of the parts of your site AI tools touch: how crawlers fetch it, how the HTML is rendered, how content is structured, and what machine-readable signals (schema, metadata, llms.txt) are attached. It is not a content audit and it is not a brand audit - it answers a single question: can an AI crawler arrive, read, and extract your facts cleanly?

How do I run a GEO audit, step by step?

  1. Crawl your site. Run Screaming Frog or Sitebulb across your full URL set. Export the report.
  2. Check raw HTML. For your homepage, top 3 product pages and pricing page, view-source or curl -A "GPTBot" <url>. Confirm content is in the source, not injected by JS.
  3. Validate schema. Paste each key URL into Google's Rich Results Test and the Schema.org validator. Note what's missing or invalid.
  4. Check robots.txt and sitemap. Verify GPTBot, ClaudeBot, PerplexityBot and Google-Extended aren't blocked, and that your sitemap is current.
  5. Run an AI legibility test. Paste each key URL into ChatGPT, Claude and Perplexity. Ask: "what does this page say, and what's missing?" Capture the gaps.
  6. Check Core Web Vitals. Run PageSpeed Insights on the same key URLs.
  7. Score and prioritise. Sort every finding into "fix this week / fix this month / nice to have" using the framework below.

Budget 3-4 hours for a first audit on a typical SaaS site. The first one is the slowest - subsequent ones run in under an hour once you have a template.

What should I actually check?

Eight areas, in roughly the order they matter for AI citations.

Area
What to check
The signal
Crawlability
robots.txt, sitemap.xml, AI bot allowlist
Can GPTBot, ClaudeBot, PerplexityBot and Google-Extended actually fetch your pages?
Rendering
Server-rendered HTML vs client-only JS
Does the raw HTML response contain your content, or only a loading shell?
Indexability
Canonical tags, noindex, duplicate URLs
Is each important page reachable at one canonical URL with the right directives?
Structured data
Organization, Product, FAQPage, Article schema
Can AI extract entity facts (name, founders, category, pricing) without guessing?
Page structure
Single H1, semantic headings, lists, tables
Is the content chunkable into clean answer-shaped passages?
Metadata
Titles, descriptions, OG, canonical URLs
Do social and AI previews describe the page accurately?
Performance
Core Web Vitals, TTFB, JS payload
Do crawlers get a response fast enough to finish rendering?
llms.txt
Curated map of citable pages
Have you told LLMs which URLs to prefer?

If any single row in this table is failing on your homepage or core product pages, fix it before doing anything else - including content work. A broken foundation invalidates every on-site and off-site move you make later.

Which tools should I use?

You can run a full first audit with free tools. The paid options become useful when you want ongoing monitoring or you're auditing 1000+ URLs.

Tool
Cost
What it's best for
Google Search Console
Free
Indexation, crawl errors, sitemap status, Core Web Vitals.
Screaming Frog (free up to 500 URLs)
Free / paid
Full crawl: status codes, headings, canonicals, schema, internal links.
Sitebulb Lite
Free
Visual crawl maps and prioritised hint reports for small sites.
Schema.org validator + Rich Results Test
Free
Verify Organization, Product, FAQPage, Article markup parses cleanly.
PageSpeed Insights / Lighthouse
Free
Core Web Vitals, render-blocking JS, mobile performance.
view-source: and curl -A 'GPTBot'
Free
Confirm the raw HTML AI crawlers see actually contains your content.
ChatGPT / Claude (with browsing)
Free / paid
Paste a URL and ask: "what does this page say, and what's missing for someone choosing a [category] tool?"
Perplexity
Free
Ask it to summarise your page and list its sources - reveals what AI can actually extract.
Ahrefs / Semrush Site Audit
Paid
Ongoing technical SEO monitoring; most checks overlap with GEO needs.

Can AI help me run the audit itself?

Yes - and this is the part of GEO auditing that's changed most in the last twelve months. ChatGPT, Claude and Perplexity are now the fastest way to test how AI actually reads your site. Five prompts worth running on any key URL:

  • "Summarise the content of this page in three sentences." - tests basic legibility.
  • "What category of product is this, and who is it for?" - tests entity and positioning clarity.
  • "List every feature mentioned on this page." - exposes whether features are extractable or buried in marketing copy.
  • "How much does this product cost?" - exposes pricing pages hidden behind tabs or JS.
  • "Who founded this company and where are they based?" - tests Organization schema and About-page signal.

If the answers are wrong, vague or include "I couldn't find that on the page", you have a specific, fixable problem - usually a rendering, structure or schema issue. Paste the failing output into ChatGPT or Claude and ask: "what change to the page would have let you answer correctly?" The response is often a usable rewrite brief.

You can also paste your raw HTML or rendered DOM into Claude and ask it to flag missing schema, weak headings or ambiguous entity references. It's the closest thing to an on-demand technical reviewer.

How do I prioritise the fixes?

Sort every finding into one of three tiers. Ship tier one before you touch tiers two or three.

[ 01 ]
Fix this week
  • AI bots blocked in robots.txt (GPTBot, ClaudeBot, PerplexityBot, Google-Extended).
  • Key pages rendered client-side only, returning an empty HTML shell.
  • Missing or broken Organization schema on the homepage.
  • No sitemap, or a sitemap returning 404s and old URLs.
[ 02 ]
Fix this month
  • Thin or duplicate H1s, no semantic structure on key landing pages.
  • FAQ and pricing content trapped inside accordions or tabs that hide it from the HTML.
  • Missing FAQPage, Product or Article schema on pages that warrant it.
  • Slow TTFB or heavy JS that blocks the first render.
[ 03 ]
Nice to have
  • Publishing a llms.txt file pointing AI tools at your best pages.
  • Adding speakable schema and breadcrumbs.
  • Consolidating near-duplicate URLs with canonicals.
  • Tidying internal linking so canonical pages get the most internal mentions.

What's the single most important fix?

Almost always: making sure the raw HTML response contains your content. A surprising number of B2B SaaS sites still render homepage and pricing copy entirely client-side, which means AI crawlers that don't execute JavaScript - and most don't - see an empty shell. Fix that one issue and you've already lifted the ceiling on every other GEO move.

Second-most-important: a valid Organization schema block on your homepage. It's the single biggest entity-recognition signal you can ship in an afternoon, and it underpins every branded prompt buyers run in AI tools.

How often should I re-audit?

Full audit once a quarter. A lighter pass (raw HTML check, schema validation, AI legibility prompts on your top 5 pages) every month. Re-run the audit immediately after any major site redesign or framework migration - those are the moments rendering and schema break.

For the layers that sit on top of the audit, see schema markup that actually helps AI cite you and how to make your SaaS site crawlable for AI.