Technical SEO

What is Robots.txt? Controlling Search Engine Crawling

Robots.txt tells search engine bots which parts of your site to access. This guide explains how it works and how to use it without hurting your rankings.

Direct Answer

Robots.txt is a plain text file placed at the root of a website (yourdomain.com/robots.txt) that gives instructions to search engine crawlers about which pages and directories they should or should not access. When Googlebot visits a site, it reads robots.txt first and follows its instructions before crawling. Robots.txt can block specific directories, specific file types, or specific bot user agents. It is also used to declare the location of XML sitemaps. Critically, robots.txt controls crawling, not indexing — blocked pages can still appear in search results if they have external links pointing to them.

Robots.txt is one of the most commonly misunderstood technical SEO elements. Many site owners assume that blocking a page in robots.txt prevents it from appearing in search results — but this is incorrect. Robots.txt only prevents Googlebot from crawling the page; if external links point to the blocked page, Google can still index it based on the link signals alone. To prevent a page from appearing in search results, a noindex directive on the page itself is required.

Common robots.txt use cases

  • Block admin and staging areas — /wp-admin/, /admin/, /staging/ should always be blocked from crawlers
  • Block duplicate parameter URLs — filtering and sorting parameters that generate content duplicates
  • Block low-value areas — user profile pages, login pages, and search results pages
  • Allow specific crawlers — useful for allowing AI crawlers (GPTBot, PerplexityBot) while managing other bots
  • Sitemap declaration — pointing bots to your XML sitemap location ('Sitemap: https://yourdomain.com/sitemap.xml')
  • Test staging environments — blocking all bots from staging sites to prevent duplicate indexation
Technical SEO audit
Can robots.txt harm SEO if configured incorrectly?

Yes — incorrectly configured robots.txt is one of the most damaging technical SEO mistakes. Accidentally blocking Googlebot from accessing the entire site (a common error after migrations) will cause all pages to drop from the index within days. Blocking CSS and JavaScript files prevents Google from rendering pages correctly, causing rendering failures. Always test robots.txt changes in the Robots.txt Tester in Google Search Console before deploying, and monitor indexation immediately after any robots.txt change.

Should robots.txt explicitly allow AI crawlers?

Yes — explicitly allowing AI crawlers by name is a positive AEO/GEO signal. The default User-agent: * rule allows all crawlers including AI bots, but many sites have specific disallow rules for subpaths that may inadvertently block AI crawlers. Explicitly listing GPTBot, Google-Extended, ClaudeBot, PerplexityBot, and other major AI crawlers with Allow: / rules signals active cooperation with AI indexing. This does not override disallow rules for other bots unless specifically crafted to do so.

Jordan Okafor

Digital Marketing Specialist · Elite Digital Agency

A member of the Elite Digital team with expertise in SEO, AEO, and AI-era digital strategy for UK businesses and charities.

Want expert help with your digital marketing?

Our team of SEO, AEO, and performance specialists are ready to review your strategy.