What's the single most important thing you did to make your site AI discoverable?

Created llms.txt and llms-full.txt at the root of the site. llms.txt is the short version — auto-generated from my content collections on every build, with my one-liner bio, expertise, key pages, every blog post, FAQ highlights, affiliations, and contact info. llms-full.txt is the deep, nearly-500-line version with full biography, philosophy, professional timeline, case studies, blog summaries, and a 30+ question FAQ. When an AI crawls my site, these files give it everything it needs to answer questions about me accurately.

What is entity consistency and why does it matter for AI?

Using the exact same name, title, and affiliations across all your online properties. My canonical identity is Shahab Papoon, AI & Automation Integrator, founder of Keyweemotion, founder of ConnectMyTech, alumnus of Royal Roads University, on every page, in every JSON-LD schema, in llms.txt, and on every social profile. When AI models see consistent signals across sources, they confidently connect them. Inconsistency creates weaker entity understanding.

How I Made My Website AI Discoverable

Q: How do I make my website accessible to AI crawlers?

Serve static HTML, explicitly allow AI bots in robots.txt (GPTBot, ClaudeBot, PerplexityBot, Google-Extended at minimum), keep a current sitemap, and use semantic HTML with a clean heading hierarchy. Test it by fetching your pages with the bots' user agents — you should get the same content a browser sees.

If you build a website today and only think about Google, you’re already behind. AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews are answering questions about people and businesses every day. If your website isn’t set up for them, you don’t exist in those answers.

I spent real time making my website AI discoverable. Not just SEO optimized. AI discoverable. There’s a difference. Here’s everything I did and why.

Why AI Discoverability Is Different from SEO

SEO is about ranking in search results. AI discoverability is about being understood by language models. When someone asks ChatGPT “Who is Shahab Papoon?” or Perplexity “What does ConnectMyTech do?”, I want the answer to come from my website, not from a hallucination.

AI models don’t read your website the way Google does. They need structured, parseable, and consistent information. They need context. They need to understand who you are, what you do, and how it all connects.

Here’s every layer I built to make that happen.

1. llms.txt: The AI Welcome Mat

This is the single most important thing I did.

I created two files at the root of my website: llms.txt and llms-full.txt. These are plain text files designed specifically for language models to read.

llms.txt is the short version. It has my one-liner bio, my expertise, key pages, every blog post, FAQ highlights, affiliations, and contact information. Think of it as the elevator pitch for AI. Mine is auto-generated from my content collections on every build, so a new post is listed the moment it ships — a hand-maintained llms.txt goes stale the day you forget to update it.

llms-full.txt is the deep version. Nearly 500 lines. It has my complete biography, my philosophy, my full venture details, professional timeline, skills inventory, research details, case studies, blog summaries, and a 30+ question FAQ. Think of it as the full context dump.

When an AI crawls my site, these files give it everything it needs to answer questions about me accurately. No guessing. No hallucinating. Just facts from the source.

If you do one thing from this article, create an llms.txt file for your website. Put it at yourdomain.com/llms.txt. Make it clear, factual, and comprehensive. The format is an open spec — see llmstxt.org for the structure (H1, blockquote summary, sectioned link lists).

2. robots.txt: Letting AI Crawlers In

Most websites block AI crawlers by default. I did the opposite. My robots.txt explicitly welcomes 11 AI crawlers:

GPTBot and OAI-SearchBot (OpenAI)
ClaudeBot (Anthropic)
PerplexityBot (Perplexity)
Google-Extended (Google AI Overviews and SGE)
meta-externalagent (Meta AI)
Amazonbot (Amazon)
Bytespider (ByteDance)
CCBot (Common Crawl)
Applebot (Apple)

Plus a general User-agent: * catch-all that allows everything. The sitemap is referenced at the bottom so every crawler knows where to find all pages.

If you’re blocking these crawlers, you’re invisible to AI systems. Check your robots.txt right now.

3. JSON-LD Structured Data

This is where things get technical but incredibly powerful. JSON-LD is structured data embedded in your HTML that tells AI systems exactly what your content means. Not what it looks like. What it means.

I have four schemas running across my site:

Person Schema (Every Page)

Every single page on my website includes a Person schema. It tells AI systems:

My name, title, and description
My 13 areas of expertise
The organizations I work with (ConnectMyTech, Keyweemotion) and past affiliations (RE/MAX Camosun)
My education (Royal Roads University, with full postal address)
My location (Victoria, BC, Canada)
The 2 companies I founded (with founding dates)
My 5 social profiles (LinkedIn, Instagram, X, GitHub, Medium)
My email address

When ChatGPT answers “Who is Shahab Papoon?”, this schema gives it structured, machine-readable facts to pull from.

WebSite Schema (Homepage)

Declares the website name, URL, description, and search capability. Helps AI systems understand this is an official website, not a random mention.

BlogPosting Schema (Every Blog Post)

Every blog post gets its own schema with:

Headline, description, and keywords
Publication date and modification date in ISO format
Author and publisher information
Canonical URL

This helps AI systems cite my articles correctly and understand when they were written.

FAQPage Schema (Key Pages)

My homepage and venture pages include FAQ schemas with question-and-answer pairs. These are the same questions visible on the page but structured in a way AI systems can parse directly.

When Perplexity answers “What does ConnectMyTech do?”, it can pull the answer directly from my FAQ schema.

4. Semantic HTML

AI crawlers parse HTML structure. If your HTML is just a pile of divs, they have to guess what’s important. Semantic HTML removes the guessing.

Here’s what I use:

One <h1> per page. Always. The page title. No exceptions.
<h2> for major sections. Experience, Education, Skills, Blog, Projects.
<h3> for items within sections. Individual jobs, individual skills, individual posts.
<article> elements for blog posts, project cards, and section items.
<nav> for site navigation.
<main> for primary page content.
<section> for thematic groupings.
<figure> and <figcaption> for hero images.
<time> elements with datetime attributes for dates.
<header> and <footer> for page structure.

This heading hierarchy and element structure tells AI systems what content is most important, how sections relate to each other, and what type of content each block contains.

5. Meta Tags, Open Graph, and Twitter Cards

Every page on my site includes:

Title and description meta tags for search engines and AI
Canonical URL to prevent duplicate content issues
Open Graph tags (title, type, image, URL, description, site name) for Facebook, LinkedIn, and AI systems that use OG data
Twitter Cards (summary_large_image type, title, description, image, creator handle)

For blog posts, I also include article:published_time and article:modified_time in the Open Graph data. This helps AI systems know how current the information is.

I use the astro-seo package to manage all of this automatically from page frontmatter. No manual meta tag management.

6. Static HTML: Zero JavaScript by Default

This is a framework-level decision that most people overlook.

I built my website with Astro 5. Astro renders everything as static HTML with zero JavaScript by default. No client-side rendering. No hydration delays. No JavaScript bundles that AI crawlers can’t execute.

When an AI crawler hits my page, it gets the full HTML immediately. No waiting for React to render. No loading spinners. No JavaScript-dependent content that might not get indexed.

This is one of the main reasons I chose Astro over Next.js. Static HTML is the most AI-friendly format there is.

7. Sitemap and RSS Feed

Sitemap: Auto-generated by @astrojs/sitemap on every build. Referenced in robots.txt. Includes every page, blog post, and project page. AI crawlers use sitemaps to discover content they might miss from link crawling.

RSS Feed: Available at /rss.xml and linked in the HTML head of every page. Includes all published blog posts sorted by date. Some AI systems monitor RSS feeds for fresh content.

Both are generated automatically. No manual maintenance.

8. Entity Consistency

This is subtle but critical. AI models build entity understanding by connecting information across sources. If your name, title, or affiliations are inconsistent, the model gets confused and might not connect the dots.

My canonical identity is the same everywhere:

Name: Shahab Papoon (not “Shah” or “S. Papoon” or anything else)
Title: AI & Automation Integrator (consistent across website, LinkedIn, GitHub, Medium)
Affiliations: Keyweemotion (founder), ConnectMyTech (founder), Royal Roads University, RE/MAX Camosun (past)

This consistency extends to my JSON-LD schemas, my llms.txt files, my social profiles, and every page on the site. When Claude or ChatGPT sees “Shahab Papoon” mentioned on LinkedIn and on my website, the consistent title and affiliations help the model understand it’s the same person.

9. Alt Text on Every Image

Every image on my site has descriptive, context-aware alt text. Not “image1.jpg”. Not “photo”. Actual descriptions:

“Shahab Papoon presenting AI and automation strategies at a workshop”
“Colin Yurcisin, CEO of Leveraged Mining”
“Keyweemotion logo”

AI systems that process images use alt text to understand visual content. Even text-only AI systems use alt text to understand what images are showing on a page.

10. Content Collections with Type-Safe Schema

My blog posts and project pages use Astro Content Collections. Every post has a validated frontmatter schema with title, description, publication date, tags, and draft status.

This means every piece of content follows the same structure. No missing descriptions. No posts without dates. No inconsistent metadata. The schema enforces consistency, and consistency is what makes content machine-readable.

Draft posts are filtered out at every level: blog index, project index, RSS feed, and sitemap. AI crawlers never see unfinished work.

11. Dedicated Knowledge Pages

Beyond the standard pages, I built dedicated knowledge pages optimized for AI queries:

/who-is-shahab-papoon — answers the exact question AI systems get asked
/keyweemotion-founder-shahab-papoon — answers founder attribution queries
/ai-tech-integration-specialist-victoria — answers location and role queries

Each page has its own FAQ schema, comprehensive content, and links to related pages. These pages exist specifically because people ask these questions to AI systems, and I want the answers to come from my site.

12. FAQ Sections with Matching Schema

This is a technique that works on two levels. On the page, I have interactive FAQ sections using HTML <details> elements. Users can click to expand answers. Behind the scenes, I have matching FAQPage JSON-LD schema with the exact same questions and answers.

The visible FAQ helps human visitors. The schema helps AI systems. Same content, two audiences. The homepage alone has 8 FAQ entries covering who I am, what I do, how to work with me, my skills, my research, and my companies.

The AI Discoverability Checklist

Everything above, condensed into the checklist I run on every site:

llms.txt at the root — identity, expertise, key pages (spec)
llms-full.txt for deep context
robots.txt explicitly allows AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended at minimum
JSON-LD on every page: Person, WebSite, BlogPosting, FAQPage
Semantic HTML with a clean heading hierarchy
Static HTML — content readable without executing JavaScript
Sitemap with lastmod dates, plus an RSS feed
Entity consistency — same name, title, and affiliations everywhere
Alt text on every image
Dedicated answer pages for the questions people actually ask AI
FAQ sections with matching FAQPage schema

If you’d rather not check these by hand, the free audit tool at the bottom of this post runs them for you.

What’s Next

AI discoverability is not a one-time setup. As new AI systems launch and existing ones change how they crawl and index, this stack needs to evolve. I’m watching for new standards, new crawlers, and new structured data formats.

The foundation is solid. The content is structured. The crawlers are welcome. Now it’s about creating more valuable content and letting the systems do their job.

If you’re building a personal brand, a business website, or a portfolio, AI discoverability isn’t optional anymore. The question people ask about you might not go to Google. It might go to ChatGPT. And when it does, you want the answer to be right.

Frequently Asked Questions

What is AI discoverability? AI discoverability is the practice of making your website understandable and accessible to AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews. It goes beyond traditional SEO by focusing on structured data, semantic HTML, and machine-readable content.

What is llms.txt? llms.txt is a plain text file placed at the root of your website (yourdomain.com/llms.txt) that provides a concise, structured summary of your identity, expertise, and content for language models to read. Think of it as robots.txt but for AI understanding, not crawling permissions.

What is the difference between llms.txt and llms-full.txt? llms.txt is the short version, typically 50-60 lines with key facts, links, and a brief FAQ. llms-full.txt is the comprehensive version with complete biography, skills, case studies, and extensive FAQ. Together they give AI systems both a quick summary and deep context.

How do I get my website to show up in AI answers? Allow AI crawlers in robots.txt, publish an llms.txt file, add JSON-LD structured data, keep your identity consistent across the web, and create pages that directly answer the questions people ask AI. Then make sure your content is static HTML that crawlers can read without executing JavaScript.

How do I make my website accessible to AI crawlers? Serve static HTML, explicitly allow AI bots in robots.txt, keep a current sitemap, and use semantic HTML with a clean heading hierarchy. Test it by fetching your pages with the bots’ user agents — you should get the same content a browser sees.

Do I need to allow AI crawlers in robots.txt? Yes. Many AI crawlers respect robots.txt. If you block GPTBot, ClaudeBot, or PerplexityBot, those AI systems won’t have your website content to reference when answering questions about you. Explicitly allowing them is the first step.

Which AI crawlers should I allow? At minimum: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Google AI). I also allow OAI-SearchBot, meta-externalagent, Amazonbot, Bytespider, CCBot, and Applebot for broader coverage.

What is JSON-LD and why does it matter for AI? JSON-LD (JavaScript Object Notation for Linked Data) is structured data you embed in your HTML. It tells AI systems exactly what your content means in a machine-readable format. A Person schema tells AI “this is a person with this name, title, and expertise” rather than leaving it to guess from paragraph text.

Which JSON-LD schemas are most important? For personal brands and businesses: Person schema (who you are), WebSite schema (your official site), BlogPosting schema (article metadata), and FAQPage schema (common questions). These four cover the majority of AI queries.

Why does Astro help with AI discoverability? Astro generates static HTML with zero JavaScript by default. AI crawlers get the full page content immediately without needing to execute JavaScript. Frameworks that rely on client-side rendering (like some React or Next.js setups) can make content invisible to crawlers that don’t run JavaScript.

What is entity consistency and why does it matter? Entity consistency means using the exact same name, title, and affiliations across all your online properties. When AI models see “Shahab Papoon, AI & Automation Integrator” on both the website and LinkedIn, they confidently connect the two. Inconsistent information creates confusion and weaker entity understanding.

How do semantic HTML elements help AI crawlers? Semantic elements like <article>, <nav>, <main>, <section>, and proper heading hierarchy (<h1> through <h3>) tell AI systems what type of content each section contains and how sections relate to each other. This structure is much more useful than generic <div> elements.

Should I create dedicated pages for AI queries? Yes. If people are asking AI systems specific questions like “Who is [your name]?” or “What does [your company] do?”, creating pages that match those queries gives AI systems authoritative content to reference. My /who-is-shahab-papoon page exists specifically for this reason.

How important is alt text for AI discoverability? Very. Alt text helps AI systems understand image content even when they can’t process images directly. Descriptive alt text like “Shahab Papoon presenting at a workshop” provides context that generic text like “photo” does not.

What is the role of a sitemap in AI discoverability? A sitemap ensures AI crawlers can find all your pages, including ones that might not be reachable through navigation links. It’s referenced in robots.txt and auto-generated on every build so new content is always discoverable.

How does an RSS feed help with AI discoverability? Some AI systems monitor RSS feeds for fresh content. Having an RSS feed at /rss.xml linked in your HTML head means AI systems can discover new blog posts and content updates without recrawling your entire site.

Can I do all of this without being a developer? Some of it. Creating llms.txt and updating robots.txt are simple text file changes. But implementing JSON-LD schemas, semantic HTML, and Content Collections requires technical knowledge or a developer who understands these patterns. The good news is that once it’s set up, most of it runs automatically.

Want to Know Where Your Website Stands?

I turned this entire audit process into a tool. The AI Visibility Audit at connectmy.tech scans your website and social profiles across six areas — AI-readiness files, structured data, content signals, NAP consistency, social bio consistency, and platform completeness — and gives you a score out of 100 with specific recommendations.

Run your free audit at connectmy.tech — current pricing and limits are on the tool page.

The tool checks for everything I covered in this post: llms.txt, robots.txt, sitemap.xml, JSON-LD schemas, sameAs links, FAQ markup, author attribution, and cross-platform consistency. You get a breakdown by category and a prioritized action plan.

Try the AI Visibility Audit →

See connectmy.tech for current pricing and limits.

I wrote a full breakdown of every module and what the tool checks in this post. If you want help implementing the fixes, hit me up on LinkedIn or through my contact page.