Understanding Meta Robots and X-Robots-Tag Directives

When you need page-level or resource-level control over how search engines treat your content, two mechanisms sit at the heart of modern SEO: the meta robots tag (an HTML element) and the X-Robots-Tag (an HTTP response header). Together they let you tell crawlers whether to index a page, follow its links, show a cached copy, display a snippet or even surface an image preview—all without touching robots.txt. This article is a deep technical dive into every directive, how per-bot targeting works, what happens when rules conflict and the mistakes that trip up even experienced teams.

Meta robots tag vs X-Robots-Tag: what is what

The meta robots tag

Placed inside the <head> of an HTML document, the meta robots tag is the most familiar way to issue indexing directives:

<meta name="robots" content="noindex, nofollow">

The name attribute identifies the target (all bots when set to robots, or a specific crawler like googlebot). The content attribute holds a comma-separated list of directives. Because it lives inside HTML, it only works for documents that browsers and crawlers actually parse as web pages.

The X-Robots-Tag HTTP header

The X-Robots-Tag achieves the same result but at the HTTP layer:

X-Robots-Tag: noindex, nofollow

Because it is a response header, it works on any resource type—PDFs, images, video files, JSON feeds, XML sitemaps—not just HTML pages. This makes it indispensable for controlling non-HTML assets that search engines might otherwise index.

You can also target a specific bot by prefixing the directives:

X-Robots-Tag: googlebot: noindex

The complete directive reference

noindex

Tells the crawler not to add the page to the search index. If the page is already indexed, it will be removed after the next crawl. This is the single most important directive for keeping private, staging or low-value pages out of search results. Note: the crawler must still be able to access the page to read the directive. Blocking the URL in robots.txt prevents the bot from ever seeing the noindex tag, so the page could remain indexed based on external signals.

nofollow

Instructs the crawler not to follow any outbound links on the page for ranking or discovery purposes. This is different from the rel="nofollow" attribute on individual <a> elements, which targets a single link. The meta-level nofollow applies to every link on the page. Use it sparingly—blanket nofollow can cut off internal link equity flow and prevent important pages from being discovered.

noarchive

Prevents search engines from showing a cached copy of the page in their results. The page can still be indexed and appear in search, but users will not see a "Cached" link. Useful for pages with time-sensitive content or pricing information that should not be viewed in stale form.

nosnippet

Stops the search engine from displaying any text snippet or video preview in the results page. The page can still rank and appear, but without a description beneath the title. This is a blunt tool—most sites benefit from snippets, so apply it only when legal or privacy requirements demand it.

max-snippet:[number]

Controls the maximum character length of the text snippet shown in results. For example, max-snippet:50 limits the snippet to 50 characters. Setting it to 0 is equivalent to nosnippet. Setting it to -1 means no limit—Google can use as much text as it considers useful. This lets you fine-tune snippet length without removing them entirely.

max-image-preview:[setting]

Defines the maximum size of image previews shown in search results. Accepted values:

  • none — no image preview at all.
  • standard — a default-sized preview image.
  • large — a larger preview, which can increase visibility in Discover and image-heavy SERP features.

Setting max-image-preview:large is often recommended if you want your pages eligible for Google Discover and rich visual results.

max-video-preview:[number]

Sets the maximum duration in seconds for a video snippet preview. A value of 0 disables video previews. A value of -1 allows unlimited preview length. This is relevant for pages that embed video content and want to control how much of it search engines can show.

unavailable_after:[date]

Tells the search engine to stop showing the page after a specific date and time. The format follows RFC 850 or ISO 8601. After the specified date, the page is treated as if it has a noindex directive. This is perfect for event pages, limited-time promotions or job postings that should automatically disappear from results when they expire.

<meta name="robots" content="unavailable_after: 2026-06-30T23:59:59+00:00">

notranslate

Tells Google not to offer a translation of the page in search results. The original page still appears, but users browsing in a different language will not see a "Translate this page" link.

noimageindex

Requests that images on the page not be indexed. Note that if the image is referenced from another page without this directive, it may still be indexed. This directive is not universally supported across all search engines.

Per-bot targeting

Both mechanisms support targeting specific crawlers. In the meta tag, replace robots with the bot name:

<meta name="googlebot" content="noindex">
<meta name="bingbot" content="noarchive">

You can include multiple meta tags, each addressing a different bot. Directives in a bot-specific tag override the generic robots tag for that bot. For example:

<meta name="robots" content="noindex">
<meta name="googlebot" content="index">

In this case, Googlebot sees index (from its specific tag) and ignores the generic noindex. All other bots follow the generic rule and do not index. This is powerful for scenarios where you want content in one search engine but not others.

With the X-Robots-Tag header, per-bot targeting uses a prefix:

X-Robots-Tag: googlebot: nosnippet
X-Robots-Tag: bingbot: noarchive

Multiple X-Robots-Tag headers can appear in the same HTTP response, each with its own bot prefix and directives.

Priority rules when directives conflict

Understanding how search engines resolve conflicting signals is critical. The general rules are:

  1. Most restrictive directive wins. If a meta robots tag says index and the X-Robots-Tag header says noindex, the page will not be indexed. Search engines combine all applicable directives and apply the most restrictive interpretation.
  2. Bot-specific directives override generic ones for that bot. A <meta name="googlebot"> tag takes precedence over <meta name="robots"> for Googlebot specifically.
  3. robots.txt blocking prevents directive reading. If robots.txt disallows a URL, the crawler never fetches the page, never reads the meta tag or header, and therefore never processes the directive. A blocked page with a noindex tag may remain indexed because the bot never saw the instruction.
  4. Both sources are combined. Meta robots and X-Robots-Tag are not mutually exclusive—they are additive. A crawler reads both and merges all applicable directives into a single set of instructions.

Common mistakes and how to avoid them

Blocking crawling and expecting noindex to work

This is the most frequent error. A page is disallowed in robots.txt and also has <meta name="robots" content="noindex">. Because the bot cannot fetch the page, it never sees the noindex directive. The page may remain in the index indefinitely based on incoming links and anchor text. Solution: if you want a page de-indexed, allow crawling so the bot can read the noindex tag.

Applying noindex to paginated or filtered pages carelessly

Marking paginated listing pages as noindex can orphan the products or articles linked from those deeper pages. Search engines may stop following the internal links because the pages carrying them are excluded from the index. Solution: keep paginated pages indexable, use rel="canonical" pointing to the main listing or use noindex, follow to allow link discovery while preventing index bloat.

Using nofollow on internal links for "PageRank sculpting"

Years ago, SEOs used internal nofollow to funnel link equity. Google has stated that the equity is still consumed—it simply evaporates rather than being redistributed. Solution: use proper site architecture and crawl controls instead.

Forgetting X-Robots-Tag on non-HTML resources

PDFs, images and other media files cannot carry a meta tag. If they should not be indexed, the only option is the X-Robots-Tag header. Many teams forget to configure their web server or CDN to add this header for non-HTML content types. Solution: add server-level rules (in Apache, Nginx or your CDN) to inject X-Robots-Tag headers on the file types that need them.

Leaving staging or development environments without noindex

Staging sites that are accidentally public and lack a noindex directive can get indexed, creating duplicate content issues with the production site. Solution: always protect staging environments with authentication or, at minimum, a site-wide noindex meta tag and X-Robots-Tag header.

Ignoring the unavailable_after directive for ephemeral content

Event pages and limited promotions that linger in search results months after they expire create a poor user experience. Solution: use unavailable_after with the expiration date so the page is automatically de-indexed when the content becomes irrelevant.

How to audit your directives with Spider.es

Spider.es crawls your site the way search engine bots do, reading both meta robots tags and X-Robots-Tag headers for every URL. The audit report flags:

  • Pages with conflicting directives (e.g., noindex in the header but index in the meta tag).
  • Pages blocked by robots.txt that also carry indexing directives the bot will never see.
  • Non-HTML resources lacking an X-Robots-Tag header when one might be needed.
  • Expired unavailable_after dates that should have triggered de-indexing.

Running a regular crawl and reviewing these signals ensures your indexing controls are working as intended—not silently failing.

Final thoughts

Meta robots tags and X-Robots-Tag headers are the precision instruments of crawl control. While robots.txt is a broad gate, these directives let you fine-tune what gets indexed, how it appears in results and when it expires. Master the directive set, understand the priority rules, avoid the common pitfalls and audit regularly. Your search presence depends on it.

Back to the blog