Spider.es crawler access FAQ

Frequently Asked Questions

Discover how Spider.es helps you audit crawler access, diagnose technical SEO issues and manage the new wave of AI bots.

Jump to a question

Pick a topic to scroll straight to the answer.

How can I check if Googlebot is blocked by my site?
How do I test Bingbot vs. Googlebot access?
Can I see if AI crawlers like ChatGPT or Perplexity can crawl my site?
Why isn't Google indexing all my sitemap pages?
What's an easy way to understand robots.txt?
Can I test specific pages, not just the homepage?
Spider.es: essential insights for SEO professionals & webmasters
How to improve SEO visibility with Spider.es reports
Common crawler access issues & fixes
What does Spider.es analyse?
SEO essentials worth remembering

How can I check if Googlebot is blocked by my site?

Run any URL through Spider.es and, within seconds, you'll see the robots.txt rule, meta directive or X-Robots-Tag header that affects Googlebot, together with the exact allow or disallow that fired.

How do I test Bingbot vs. Googlebot access?

Compare the Bingbot and Googlebot rows in the decision table to spot differences in permissions, crawl delays or overrides for each engine.

Can I see if AI crawlers like ChatGPT or Perplexity can crawl my site?

Spider.es keeps an eye on GPTBot, ChatGPT-User, Claude, Perplexity, Google-Extended and many other AI user agents, flagging whether they are blocked and which directive enforces it.

Why isn't Google indexing all my sitemap pages?

If strategic URLs are disallowed or tagged noindex, they won't be indexed even if the sitemap references them. Use the report to ensure key sections are crawlable, then resubmit the sitemap in Search Console.

What's an easy way to understand robots.txt?

Robots.txt is a site-wide manifest of crawl rules. Spider.es highlights the directive that matched your URL so you understand the impact without parsing the file line by line.

Can I test specific pages, not just the homepage?

Submit the full URL of any product page, article or resource—Spider.es checks robots.txt, meta tags and headers for that specific path so you can validate granular directives.

Spider.es: essential insights for SEO professionals & webmasters

Spider.es maintains a curated, categorised directory of crawlers. From headline search engines and AI LLM bots to SEO auditors, social platforms, security services and research scrapers, you know exactly who is hitting your site and why that matters.

Supported crawlers and user-agents

Here's a snapshot of the ecosystems Spider.es monitors to help you stay in control of crawlability, security and performance.

Search engines: Googlebot, Bingbot, YandexBot, Baiduspider, DuckDuckBot, Applebot, Qwantbot, SeznamBot, Sogou.
AI & LLM crawlers: ChatGPT-User, GPTBot, Google-Extended, ClaudeBot, Claude-Web, PerplexityBot, Cohere, Anthropics, OAI-SearchBot, Quillbot, YouBot, MyCentralAIScraperBot.
SEO tools: AhrefsBot, SemrushBot, MJ12bot, DotBot, DataForSeoBot, Awario bots, SEOkicks, Botify, Jetslide, peer39.
Social & sharing: facebookexternalhit, FacebookBot, Twitterbot (X), Pinterestbot, Slackbot, Meta external fetchers.
Security & cloud: AliyunSecBot, Amazonbot, Google-CloudVertexBot and more.
Scrapers & research: BLEXBot, Bytespider, CCBot, Diffbot, DuckAssistBot, EchoboxBot, FriendlyCrawler, ImagesiftBot, magpie-crawler, NewsNow, news-please, omgili, Poseidon Research Crawler, Quora-Bot, Scrapy, SeekrBot, SeznamHomepageCrawler, TaraGroup, Timpibot, TurnitinBot, ViennaTinyBot, ZoomBot, ZoominfoBot.

How to improve SEO visibility with Spider.es reports

Turn every report into a checklist that keeps search engines focused on your most valuable content.

Optimise crawl budget: retire low-value or duplicate areas so Google spends time on strategic URLs.
Expose critical resources: make sure CSS, JavaScript and imagery remain crawlable for full rendering.
Reference sitemaps: declare or refresh XML sitemaps in robots.txt to guide discovery.
Refine directives: catch accidental blocks or redundant allows and align them with your SEO strategy.

Common crawler access issues & fixes

Watch for these warning signs before they erode organic traffic:

Unintentional disallows: prune legacy robots.txt rules that now block important sections.
Server errors & dead pages: resolve 5xx responses and 404s that waste crawl budget.
Parameter chaos: consolidate variants with clean URLs and canonical tags.
JavaScript-only delivery: provide server-side rendering or fallback links for vital content.
Weak internal linking: surface orphan pages so crawlers can discover them.
User-agent or IP blocks: ensure firewalls allow legitimate bots while filtering abuse.
Mobile mismatches: align mobile and desktop experiences for Google's mobile-first index.

What does Spider.es analyse?

Spider.es inspects robots.txt, meta robots tags and X-Robots-Tag headers side by side to show which bots can crawl, who is blocked and the reason behind each outcome.

SEO essentials worth remembering

Robots.txt overview

Robots.txt stops compliant bots before a URL is fetched. Because it is public, treat it as guidance for well-behaved crawlers, not a security barrier, and pair it with meta and header directives for finer control.

Meta robots vs. X-Robots-Tag

Meta robots tags live in HTML, while X-Robots-Tag headers apply to any file type. Combined, they control indexing behaviour for pages and assets that make it past the crawl gate.

Why AI bots might be blocked

AI crawlers can consume bandwidth, reuse proprietary content or spark legal debates. Blocking them in robots.txt or response headers makes your policy explicit and protects your data.

When it's okay to block bots

It's appropriate to block private areas, staging sites, duplicate content or aggressive scrapers. Pair disallow rules with noindex where necessary and maintain a whitelist for the bots you rely on.