Frequently Asked Questions
Discover how Spider.es helps you audit crawler access, diagnose technical SEO issues and manage the new wave of AI bots.
Jump to a question
Pick a topic to scroll straight to the answer.
- How can I check if Googlebot is blocked by my site?
- How do I test Bingbot vs. Googlebot access?
- Can I see if AI crawlers like ChatGPT or Perplexity can crawl my site?
- Why isn't Google indexing all my sitemap pages?
- What's an easy way to understand robots.txt?
- Can I test specific pages, not just the homepage?
- Spider.es: essential insights for SEO professionals & webmasters
- How to improve SEO visibility with Spider.es reports
- Common crawler access issues & fixes
- What does Spider.es analyse?
- SEO essentials worth remembering
How do I test Bingbot vs. Googlebot access?
Can I see if AI crawlers like ChatGPT or Perplexity can crawl my site?
Why isn't Google indexing all my sitemap pages?
What's an easy way to understand robots.txt?
Can I test specific pages, not just the homepage?
Spider.es: essential insights for SEO professionals & webmasters
Spider.es maintains a curated, categorised directory of crawlers. From headline search engines and AI LLM bots to SEO auditors, social platforms, security services and research scrapers, you know exactly who is hitting your site and why that matters.
Supported crawlers and user-agents
Here's a snapshot of the ecosystems Spider.es monitors to help you stay in control of crawlability, security and performance.
- Search engines: Googlebot, Bingbot, YandexBot, Baiduspider, DuckDuckBot, Applebot, Qwantbot, SeznamBot, Sogou.
- AI & LLM crawlers: ChatGPT-User, GPTBot, Google-Extended, ClaudeBot, Claude-Web, PerplexityBot, Cohere, Anthropics, OAI-SearchBot, Quillbot, YouBot, MyCentralAIScraperBot.
- SEO tools: AhrefsBot, SemrushBot, MJ12bot, DotBot, DataForSeoBot, Awario bots, SEOkicks, Botify, Jetslide, peer39.
- Social & sharing: facebookexternalhit, FacebookBot, Twitterbot (X), Pinterestbot, Slackbot, Meta external fetchers.
- Security & cloud: AliyunSecBot, Amazonbot, Google-CloudVertexBot and more.
- Scrapers & research: BLEXBot, Bytespider, CCBot, Diffbot, DuckAssistBot, EchoboxBot, FriendlyCrawler, ImagesiftBot, magpie-crawler, NewsNow, news-please, omgili, Poseidon Research Crawler, Quora-Bot, Scrapy, SeekrBot, SeznamHomepageCrawler, TaraGroup, Timpibot, TurnitinBot, ViennaTinyBot, ZoomBot, ZoominfoBot.
How to improve SEO visibility with Spider.es reports
Turn every report into a checklist that keeps search engines focused on your most valuable content.
- Optimise crawl budget: retire low-value or duplicate areas so Google spends time on strategic URLs.
- Expose critical resources: make sure CSS, JavaScript and imagery remain crawlable for full rendering.
- Reference sitemaps: declare or refresh XML sitemaps in robots.txt to guide discovery.
- Refine directives: catch accidental blocks or redundant allows and align them with your SEO strategy.
Common crawler access issues & fixes
Watch for these warning signs before they erode organic traffic:
- Unintentional disallows: prune legacy robots.txt rules that now block important sections.
- Server errors & dead pages: resolve 5xx responses and 404s that waste crawl budget.
- Parameter chaos: consolidate variants with clean URLs and canonical tags.
- JavaScript-only delivery: provide server-side rendering or fallback links for vital content.
- Weak internal linking: surface orphan pages so crawlers can discover them.
- User-agent or IP blocks: ensure firewalls allow legitimate bots while filtering abuse.
- Mobile mismatches: align mobile and desktop experiences for Google's mobile-first index.
What does Spider.es analyse?
Spider.es inspects robots.txt, meta robots tags and X-Robots-Tag headers side by side to show which bots can crawl, who is blocked and the reason behind each outcome.
SEO essentials worth remembering
Robots.txt overview
Robots.txt stops compliant bots before a URL is fetched. Because it is public, treat it as guidance for well-behaved crawlers, not a security barrier, and pair it with meta and header directives for finer control.
Meta robots vs. X-Robots-Tag
Meta robots tags live in HTML, while X-Robots-Tag headers apply to any file type. Combined, they control indexing behaviour for pages and assets that make it past the crawl gate.
Why AI bots might be blocked
AI crawlers can consume bandwidth, reuse proprietary content or spark legal debates. Blocking them in robots.txt or response headers makes your policy explicit and protects your data.
When it's okay to block bots
It's appropriate to block private areas, staging sites, duplicate content or aggressive scrapers. Pair disallow rules with noindex where necessary and maintain a whitelist for the bots you rely on.