Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file is a text file placed at the root of your website (e.g. https://yoursite.com/robots.txt) that tells search engine crawlers and bots which pages or sections of your site they should or shouldn't crawl. It follows the Robots Exclusion Protocol.

Question 2

Does robots.txt block a page from appearing in search results?

Accepted Answer

No — robots.txt prevents crawlers from accessing a page, but doesn't prevent the page from appearing in search results if other sites link to it. To prevent a page from appearing in search results, use a noindex meta tag or X-Robots-Tag HTTP header instead.

Question 3

How do I block AI crawlers like GPTBot from my website?

Accepted Answer

Add these rules to your robots.txt: "User-agent: GPTBot / Disallow: /" (blocks OpenAI), "User-agent: CCBot / Disallow: /" (blocks Common Crawl), "User-agent: meta-externalagent / Disallow: /" (blocks Meta AI). Note that reputable AI companies honor robots.txt, while malicious bots may not.

Question 4

Can I allow some pages while blocking others for the same bot?

Accepted Answer

Yes — use a combination of Disallow and Allow rules. Allow rules override Disallow rules when both match the same path. The more specific rule takes precedence. For example, you can block /admin/ but allow /admin/login.html.

robots.txt Generator Guide — How to Write & Validate

What Is robots.txt?

Syntax Explained

The * Wildcard

Common Use Cases

What robots.txt Does NOT Do

noindex vs robots.txt

How to Test Your robots.txt

Frequently Asked Questions

What is a robots.txt file?

Does robots.txt block a page from appearing in search results?

How do I block AI crawlers like GPTBot from my website?

Can I allow some pages while blocking others for the same bot?

Related Tools

robots.txt Generator — Free & Private