Every website should have a robots.txt file at its root β a simple text file that tells search engines and bots which URLs they can crawl. Use our free robots.txt Generator to build one with presets for WordPress, Next.js, and AI crawler blocking. Part of our Complete Developer Tools Guide.
What Is robots.txt?
A plain-text file at https://yoursite.com/robots.txt that communicates crawl preferences to compliant bots. It does not enforce security β it's a polite request, not a firewall.
Syntax Explained
- User-agent: Which bot the rules apply to (
*= all) - Disallow: Paths the bot should not crawl
- Allow: Exceptions to Disallow rules
- Sitemap: URL to your XML sitemap (recommended at file end)
- Crawl-delay: Optional seconds between requests (not supported by Google)
The * Wildcard
User-agent: * applies the following rules to all crawlers that don't have more specific rules defined elsewhere in the file.
Common Use Cases
Block admin areas (/admin/, /wp-admin/), protect staging environments, block AI training crawlers (GPTBot, CCBot, Claude-Web), or allow everything on a public marketing site. Our generator includes one-click presets.
What robots.txt Does NOT Do
Malicious bots ignore robots.txt. It is not authentication or access control β use server-level restrictions, firewalls, or auth for sensitive content.
noindex vs robots.txt
Use robots.txt to prevent crawling. Use noindex meta tags or HTTP headers to prevent indexing in search results. For pages you want hidden from Google entirely, noindex is the correct tool.
How to Test Your robots.txt
Use Google Search Console's URL Inspection tool or the robots.txt Tester to verify crawlers interpret your rules correctly before deploying.
Frequently Asked Questions
What is a robots.txt file?
A robots.txt file is a text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that tells search engine crawlers and bots which pages or sections of your site they should or shouldn't crawl. It follows the Robots Exclusion Protocol.
Does robots.txt block a page from appearing in search results?
No β robots.txt prevents crawlers from accessing a page, but doesn't prevent the page from appearing in search results if other sites link to it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.
How do I block AI crawlers like GPTBot from my website?
Add these rules to your robots.txt: "User-agent: GPTBot / Disallow: /" (blocks OpenAI), "User-agent: CCBot / Disallow: /" (blocks Common Crawl), "User-agent: meta-externalagent / Disallow: /" (blocks Meta AI). Note that reputable AI companies honor robots.txt, while malicious bots may not.
Can I allow some pages while blocking others for the same bot?
Yes β use a combination of Disallow and Allow rules. Allow rules override Disallow rules when both match the same path. The more specific rule takes precedence. For example, you can block /admin/ but allow /admin/login.html.