Question 1

Does robots.txt actually stop AI companies from scraping my data?

Accepted Answer

Mostly, yes. Major players like OpenAI (GPTBot), Anthropic (ClaudeBot), and Google (Google-Extended) have publicly stated they respect robots.txt directives. However, smaller or less ethical crawlers might ignore them. Using robots.txt is the first and most important step in declaring your preferences.

Question 2

What is the difference between GPTBot and ChatGPT-User?

Accepted Answer

GPTBot is a crawler that collects training data for future models. ChatGPT-User is a bot that fetches pages in real-time when a user explicitly asks ChatGPT to visit a URL. Blocking GPTBot protects your data from being used in future training, while blocking ChatGPT-User prevents ChatGPT from 'browsing' your live site.

Question 3

Should I block all AI bots?

Accepted Answer

It depends on your strategy. If you want to protect your proprietary data from being trained on, block 'training' bots like CCBot and GPTBot. However, you might want to allow 'search' bots like PerplexityBot or OAI-SearchBot to ensure your content is surfaced in AI-driven search results.

Question 4

What is Google-Extended?

Accepted Answer

Google-Extended is a user agent that allows web publishers to opt out of having their content used to train Google's Gemini models and other generative AI technologies, without affecting their visibility in standard Google Search results.

AI Bot Manager

AI CRAWLER CONTROLS

EXCLUDED PATHS

Robots.txt Preview

Frequently Asked Questions

AI Bot Manager

AI CRAWLER CONTROLS

EXCLUDED PATHS

Robots.txt Preview

Frequently Asked Questions