AI Bot Manager
Control which AI crawlers can access your content. Protect your original work or optimize for AI search visibility.
AI CRAWLER CONTROLS
Used to crawl data for training OpenAI's foundation models.
Anthropic's crawler for training Claude AI models.
Used for surfacing websites in ChatGPT search results.
Indexes web content for Perplexity AI search results.
Controls content usage for Gemini and Google AI training.
Open source crawl data used by many AI laboratories.
Controls content usage for Apple Intelligence features.
EXCLUDED PATHS
Robots.txt Preview
Frequently Asked Questions
Mostly, yes. Major players like OpenAI (GPTBot), Anthropic (ClaudeBot), and Google (Google-Extended) have publicly stated they respect robots.txt directives. However, smaller or less ethical crawlers might ignore them. Using robots.txt is the first and most important step in declaring your preferences.
GPTBot is a crawler that collects training data for future models. ChatGPT-User is a bot that fetches pages in real-time when a user explicitly asks ChatGPT to visit a URL. Blocking GPTBot protects your data from being used in future training, while blocking ChatGPT-User prevents ChatGPT from 'browsing' your live site.
It depends on your strategy. If you want to protect your proprietary data from being trained on, block 'training' bots like CCBot and GPTBot. However, you might want to allow 'search' bots like PerplexityBot or OAI-SearchBot to ensure your content is surfaced in AI-driven search results.
Google-Extended is a user agent that allows web publishers to opt out of having their content used to train Google's Gemini models and other generative AI technologies, without affecting their visibility in standard Google Search results.