llms.txt: the standard that guides AI through your website

For almost three decades, two files were enough to tell search engines how to treat your website: robots.txt for permissions and sitemap.xml for discovery. But the rise of large language models has exposed a gap that neither one fills: how do you explain to an AI which content on your site actually matters, and how to read it without drowning in navigation menus, scripts and ads? That is the question llms.txt sets out to answer.

What is llms.txt?

llms.txt is a Markdown-formatted file placed at the root of your domain, at https://yourdomain.com/llms.txt. It was proposed by Jeremy Howard, co-founder of Answer.AI, in September 2024. Its goal is to give AI models a curated, clean and structured view of your most relevant content.

The problem it solves is concrete: a modern HTML page is full of noise — navigation bars, banners, JavaScript, cookie pop-ups — and model context windows are limited. Asking an AI to understand your documentation from raw HTML is inefficient. llms.txt instead hands it a Markdown index with links to the pages that actually matter.

How it is structured

The format is deliberately simple:

  • An H1 heading with the name of the project or site.
  • A summary paragraph explaining what it is and who it is for.
  • Sections with Markdown link lists pointing to key pages, each with a short note about what the AI will find there.

There is also a companion format, llms-full.txt, which does not just link to content but includes it in full inside the same file — designed so a model can consume everything in a single pass.

It is not robots.txt or sitemap.xml

It is easy to mix them up, but they serve different purposes:

  • robots.txt decides who may access your site and which paths. It is an access-control mechanism.
  • sitemap.xml helps search engines discover all your URLs exhaustively, in an XML format designed for machines.
  • llms.txt does not block access or list everything: it recommends and contextualises what is important, in a format readable by both humans and models.

Put another way: robots.txt sets the door, sitemap.xml hands over the full blueprint of the building, and llms.txt is the concierge who tells you directly which floor to go to.

How much adoption does it actually have?

It is worth being honest: llms.txt is a community proposal with growing traction, not an official standard backed by a body like the IETF. A large number of technical documentation projects already publish one, and directories have appeared that aggregate llms.txt files from various sites. That said, the major model providers have not confirmed that they consume it in a guaranteed way during training or inference. Adopting it today is a low-cost bet with potential upside — not a magic solution.

How to create yours

You can write one by hand in five minutes if your site is small, or use generators that crawl your site and propose a first draft. Start with the essentials: your documentation, your product pages and the articles that best explain what you do. Keep it short and update it whenever your content changes.

Where Spider fits in

llms.txt solves the proactive half of the problem: what you offer the AI. The other half is reactive: knowing which AI crawlers are actually visiting your site and whether your robots.txt lets them through. That is where Spider.es comes in: it analyses your domain against more than a hundred bots — including GPTBot, ClaudeBot, PerplexityBot and Google-Extended — and shows you, bot by bot, who can crawl you. Publishing an llms.txt and reviewing your crawlability with Spider are two sides of the same strategy for the AI era.

Back to the blog