Journal

How I Made My Website AI Discoverable

Meme of a person confidently refusing traditional SEO with 'No thanks, I use AI' — a funny take on AI discoverability over old-school search optimization

If you build a website today and only think about Google, you’re already behind. AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews are answering questions about people and businesses every day. If your website isn’t set up for them, you don’t exist in those answers.

I spent real time making my website AI discoverable. Not just SEO optimized. AI discoverable. There’s a difference. Here’s everything I did and why.

Why AI Discoverability Is Different from SEO

SEO is about ranking in search results. AI discoverability is about being understood by language models. When someone asks ChatGPT “Who is Shahab Papoon?” or Perplexity “What does ConnectMyTech do?”, I want the answer to come from my website, not from a hallucination.

AI models don’t read your website the way Google does. They need structured, parseable, and consistent information. They need context. They need to understand who you are, what you do, and how it all connects.

Here’s every layer I built to make that happen.

1. llms.txt: The AI Welcome Mat

This is the single most important thing I did.

I created two files at the root of my website: llms.txt and llms-full.txt. These are plain text files designed specifically for language models to read.

llms.txt is the short version. About 55 lines. It has my one-liner bio, my expertise, key pages, FAQ highlights, affiliations, and contact information. Think of it as the elevator pitch for AI.

llms-full.txt is the deep version. About 370 lines. It has my complete biography, my philosophy, my full venture details, professional timeline, skills inventory, research details, case studies, blog summaries, and a 30+ question FAQ. Think of it as the full context dump.

When an AI crawls my site, these files give it everything it needs to answer questions about me accurately. No guessing. No hallucinating. Just facts from the source.

If you do one thing from this article, create an llms.txt file for your website. Put it at yourdomain.com/llms.txt. Make it clear, factual, and comprehensive.

2. robots.txt: Letting AI Crawlers In

Most websites block AI crawlers by default. I did the opposite. My robots.txt explicitly welcomes 11 AI crawlers:

  • GPTBot and OAI-SearchBot (OpenAI)
  • ClaudeBot (Anthropic)
  • PerplexityBot (Perplexity)
  • Google-Extended (Google AI Overviews and SGE)
  • meta-externalagent (Meta AI)
  • Amazonbot (Amazon)
  • Bytespider (ByteDance)
  • CCBot (Common Crawl)
  • Applebot (Apple)

Plus a general User-agent: * catch-all that allows everything. The sitemap is referenced at the bottom so every crawler knows where to find all pages.

If you’re blocking these crawlers, you’re invisible to AI systems. Check your robots.txt right now.

3. JSON-LD Structured Data

This is where things get technical but incredibly powerful. JSON-LD is structured data embedded in your HTML that tells AI systems exactly what your content means. Not what it looks like. What it means.

I have four schemas running across my site:

Person Schema (Every Page)

Every single page on my website includes a Person schema. It tells AI systems:

  • My name, title, and description
  • My 13 areas of expertise
  • The organizations I work with (ConnectMyTech, Keyweemotion) and past affiliations (RE/MAX Camosun)
  • My education (Royal Roads University, with full postal address)
  • My location (Victoria, BC, Canada)
  • The 2 companies I founded (with founding dates)
  • My 5 social profiles (LinkedIn, Instagram, X, GitHub, Medium)
  • My email address

When ChatGPT answers “Who is Shahab Papoon?”, this schema gives it structured, machine-readable facts to pull from.

WebSite Schema (Homepage)

Declares the website name, URL, description, and search capability. Helps AI systems understand this is an official website, not a random mention.

BlogPosting Schema (Every Blog Post)

Every blog post gets its own schema with:

  • Headline, description, and keywords
  • Publication date and modification date in ISO format
  • Author and publisher information
  • Canonical URL

This helps AI systems cite my articles correctly and understand when they were written.

FAQPage Schema (Key Pages)

My homepage and venture pages include FAQ schemas with question-and-answer pairs. These are the same questions visible on the page but structured in a way AI systems can parse directly.

When Perplexity answers “What does ConnectMyTech do?”, it can pull the answer directly from my FAQ schema.

4. Semantic HTML

AI crawlers parse HTML structure. If your HTML is just a pile of divs, they have to guess what’s important. Semantic HTML removes the guessing.

Here’s what I use:

  • One <h1> per page. Always. The page title. No exceptions.
  • <h2> for major sections. Experience, Education, Skills, Blog, Projects.
  • <h3> for items within sections. Individual jobs, individual skills, individual posts.
  • <article> elements for blog posts, project cards, and section items.
  • <nav> for site navigation.
  • <main> for primary page content.
  • <section> for thematic groupings.
  • <figure> and <figcaption> for hero images.
  • <time> elements with datetime attributes for dates.
  • <header> and <footer> for page structure.

This heading hierarchy and element structure tells AI systems what content is most important, how sections relate to each other, and what type of content each block contains.

5. Meta Tags, Open Graph, and Twitter Cards

Every page on my site includes:

  • Title and description meta tags for search engines and AI
  • Canonical URL to prevent duplicate content issues
  • Open Graph tags (title, type, image, URL, description, site name) for Facebook, LinkedIn, and AI systems that use OG data
  • Twitter Cards (summary_large_image type, title, description, image, creator handle)

For blog posts, I also include article:published_time and article:modified_time in the Open Graph data. This helps AI systems know how current the information is.

I use the astro-seo package to manage all of this automatically from page frontmatter. No manual meta tag management.

6. Static HTML: Zero JavaScript by Default

This is a framework-level decision that most people overlook.

I built my website with Astro 5. Astro renders everything as static HTML with zero JavaScript by default. No client-side rendering. No hydration delays. No JavaScript bundles that AI crawlers can’t execute.

When an AI crawler hits my page, it gets the full HTML immediately. No waiting for React to render. No loading spinners. No JavaScript-dependent content that might not get indexed.

This is one of the main reasons I chose Astro over Next.js. Static HTML is the most AI-friendly format there is.

7. Sitemap and RSS Feed

Sitemap: Auto-generated by @astrojs/sitemap on every build. Referenced in robots.txt. Includes every page, blog post, and project page. AI crawlers use sitemaps to discover content they might miss from link crawling.

RSS Feed: Available at /rss.xml and linked in the HTML head of every page. Includes all published blog posts sorted by date. Some AI systems monitor RSS feeds for fresh content.

Both are generated automatically. No manual maintenance.

8. Entity Consistency

This is subtle but critical. AI models build entity understanding by connecting information across sources. If your name, title, or affiliations are inconsistent, the model gets confused and might not connect the dots.

My canonical identity is the same everywhere:

  • Name: Shahab Papoon (not “Shah” or “S. Papoon” or anything else)
  • Title: AI & Automation Integrator (consistent across website, LinkedIn, GitHub, Medium)
  • Affiliations: Keyweemotion (founder), ConnectMyTech (co-founder), Royal Roads University, RE/MAX Camosun (past)

This consistency extends to my JSON-LD schemas, my llms.txt files, my social profiles, and every page on the site. When Claude or ChatGPT sees “Shahab Papoon” mentioned on LinkedIn and on my website, the consistent title and affiliations help the model understand it’s the same person.

9. Alt Text on Every Image

Every image on my site has descriptive, context-aware alt text. Not “image1.jpg”. Not “photo”. Actual descriptions:

  • “Shahab Papoon presenting AI and automation strategies at a workshop”
  • “Colin Yurcisin, CEO of Leveraged Mining”
  • “Keyweemotion logo”

AI systems that process images use alt text to understand visual content. Even text-only AI systems use alt text to understand what images are showing on a page.

10. Content Collections with Type-Safe Schema

My blog posts and project pages use Astro Content Collections. Every post has a validated frontmatter schema with title, description, publication date, tags, and draft status.

This means every piece of content follows the same structure. No missing descriptions. No posts without dates. No inconsistent metadata. The schema enforces consistency, and consistency is what makes content machine-readable.

Draft posts are filtered out at every level: blog index, project index, RSS feed, and sitemap. AI crawlers never see unfinished work.

11. Dedicated Knowledge Pages

Beyond the standard pages, I built dedicated knowledge pages optimized for AI queries:

  • /who-is-shahab-papoon — answers the exact question AI systems get asked
  • /keyweemotion-founder-shahab-papoon — answers founder attribution queries
  • /ai-tech-integration-specialist-victoria — answers location and role queries

Each page has its own FAQ schema, comprehensive content, and links to related pages. These pages exist specifically because people ask these questions to AI systems, and I want the answers to come from my site.

12. FAQ Sections with Matching Schema

This is a technique that works on two levels. On the page, I have interactive FAQ sections using HTML <details> elements. Users can click to expand answers. Behind the scenes, I have matching FAQPage JSON-LD schema with the exact same questions and answers.

The visible FAQ helps human visitors. The schema helps AI systems. Same content, two audiences. The homepage alone has 8 FAQ entries covering who I am, what I do, how to work with me, my skills, my research, and my companies.

What’s Next

AI discoverability is not a one-time setup. As new AI systems launch and existing ones change how they crawl and index, this stack needs to evolve. I’m watching for new standards, new crawlers, and new structured data formats.

The foundation is solid. The content is structured. The crawlers are welcome. Now it’s about creating more valuable content and letting the systems do their job.

If you’re building a personal brand, a business website, or a portfolio, AI discoverability isn’t optional anymore. The question people ask about you might not go to Google. It might go to ChatGPT. And when it does, you want the answer to be right.


Frequently Asked Questions

What is AI discoverability? AI discoverability is the practice of making your website understandable and accessible to AI systems like ChatGPT, Claude, Perplexity, and Google AI Overviews. It goes beyond traditional SEO by focusing on structured data, semantic HTML, and machine-readable content.

What is llms.txt? llms.txt is a plain text file placed at the root of your website (yourdomain.com/llms.txt) that provides a concise, structured summary of your identity, expertise, and content for language models to read. Think of it as robots.txt but for AI understanding, not crawling permissions.

What is the difference between llms.txt and llms-full.txt? llms.txt is the short version, typically 50-60 lines with key facts, links, and a brief FAQ. llms-full.txt is the comprehensive version with complete biography, skills, case studies, and extensive FAQ. Together they give AI systems both a quick summary and deep context.

Do I need to allow AI crawlers in robots.txt? Yes. Many AI crawlers respect robots.txt. If you block GPTBot, ClaudeBot, or PerplexityBot, those AI systems won’t have your website content to reference when answering questions about you. Explicitly allowing them is the first step.

Which AI crawlers should I allow? At minimum: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), and Google-Extended (Google AI). I also allow OAI-SearchBot, meta-externalagent, Amazonbot, Bytespider, CCBot, and Applebot for broader coverage.

What is JSON-LD and why does it matter for AI? JSON-LD (JavaScript Object Notation for Linked Data) is structured data you embed in your HTML. It tells AI systems exactly what your content means in a machine-readable format. A Person schema tells AI “this is a person with this name, title, and expertise” rather than leaving it to guess from paragraph text.

Which JSON-LD schemas are most important? For personal brands and businesses: Person schema (who you are), WebSite schema (your official site), BlogPosting schema (article metadata), and FAQPage schema (common questions). These four cover the majority of AI queries.

Why does Astro help with AI discoverability? Astro generates static HTML with zero JavaScript by default. AI crawlers get the full page content immediately without needing to execute JavaScript. Frameworks that rely on client-side rendering (like some React or Next.js setups) can make content invisible to crawlers that don’t run JavaScript.

What is entity consistency and why does it matter? Entity consistency means using the exact same name, title, and affiliations across all your online properties. When AI models see “Shahab Papoon, AI & Automation Integrator” on both the website and LinkedIn, they confidently connect the two. Inconsistent information creates confusion and weaker entity understanding.

How do semantic HTML elements help AI crawlers? Semantic elements like <article>, <nav>, <main>, <section>, and proper heading hierarchy (<h1> through <h3>) tell AI systems what type of content each section contains and how sections relate to each other. This structure is much more useful than generic <div> elements.

Should I create dedicated pages for AI queries? Yes. If people are asking AI systems specific questions like “Who is [your name]?” or “What does [your company] do?”, creating pages that match those queries gives AI systems authoritative content to reference. My /who-is-shahab-papoon page exists specifically for this reason.

How important is alt text for AI discoverability? Very. Alt text helps AI systems understand image content even when they can’t process images directly. Descriptive alt text like “Shahab Papoon presenting at a workshop” provides context that generic text like “photo” does not.

What is the role of a sitemap in AI discoverability? A sitemap ensures AI crawlers can find all your pages, including ones that might not be reachable through navigation links. It’s referenced in robots.txt and auto-generated on every build so new content is always discoverable.

How does an RSS feed help with AI discoverability? Some AI systems monitor RSS feeds for fresh content. Having an RSS feed at /rss.xml linked in your HTML head means AI systems can discover new blog posts and content updates without recrawling your entire site.

Can I do all of this without being a developer? Some of it. Creating llms.txt and updating robots.txt are simple text file changes. But implementing JSON-LD schemas, semantic HTML, and Content Collections requires technical knowledge or a developer who understands these patterns. The good news is that once it’s set up, most of it runs automatically.


Want to Know Where Your Website Stands?

I turned this entire audit process into a tool. The AI Visibility Audit at connectmy.tech scans your website and social profiles across six areas — AI-readiness files, structured data, content signals, NAP consistency, social bio consistency, and platform completeness — and gives you a score out of 100 with specific recommendations.

You get 5 free audits per day. After that, each additional audit is $9.99. As of February 22, 2026, the payment system is on test net, so you can use any card number to bypass it and see your results.

The tool checks for everything I covered in this post: llms.txt, robots.txt, sitemap.xml, JSON-LD schemas, sameAs links, FAQ markup, author attribution, and cross-platform consistency. You get a breakdown by category and a prioritized action plan.

Try the AI Visibility Audit →

I wrote a full breakdown of every module and what the tool checks in this post. If you want help implementing the fixes, hit me up on LinkedIn or through my contact page.