2026 Edition

Robots.txt
Generator 2026

The most comprehensive robots.txt builder online. Control AI crawlers, block bad bots, apply CMS-specific rules for WordPress, Shopify, Wix, and more. Generate, preview, and download your file instantly.

... robots.txt files generated
Import Existing File

Already have a robots.txt? Upload it to populate all settings automatically and see exactly what you have.

📄
Drag and drop your robots.txt here, or click to browse
.txt files only • max 512 KB • never uploaded to any server
1. Site Setup

Enter your site URL and select your platform. Your URL is used to auto-build your sitemap path and is never stored.

2. Global Rules

These apply to all crawlers (User-agent: *). Select common patterns to disallow.

3. CMS Patterns

Platform-specific paths pre-selected based on your chosen CMS. Toggle any individually.

Note: /wp-admin/admin-ajax.php is auto-allowed when admin is blocked.

No CMS-specific patterns for static HTML. Use Global Rules and the custom area below.

Use Global Rules and the custom area below for your platform.

4. E-Commerce Rules

Block transactional and user-specific pages that offer no SEO value and waste crawl budget.

5. Good Bots

Optionally add explicit Allow rules for major search engines. Useful after restrictive global rules. Note: only Googlebot understands Allow.

6. AI Crawlers

Control access for AI and LLM crawlers individually. Each bot can be allowed, blocked, or left unspecified (inherits global rules).

OpenAI
GPTBot ChatGPT training + search
OAI-SearchBot OpenAI search indexing
ChatGPT-User User-initiated fetches
Anthropic
ClaudeBot Claude AI training
Claude-User User-initiated Claude fetches
Claude-SearchBot Claude search indexing
Perplexity AI
PerplexityBot Perplexity search indexing
Perplexity-User User query retrieval
Warning: Perplexity has been documented using undeclared generic user-agent strings to bypass blocking. robots.txt may not fully stop it.
Google AI
Google-Extended Google Bard / Gemini training
Apple
Applebot Standard Apple crawler
Applebot-Extended Apple AI training opt-out
Meta / Facebook
FacebookBot Meta AI crawler
meta-externalagent Meta external agent
ByteDance / TikTok
Bytespider ByteDance LLM training
Other AI Crawlers
CCBot Common Crawl (AI training data)
Diffbot Structured data aggregator
cohere-ai Cohere AI training
anthropic-ai Alternative Anthropic identifier
7. Bad Bots

Block SEO scrapers and spam crawlers that waste bandwidth and scrape your content for competitor intelligence tools.

8. Sitemap & Advanced

Add your sitemap URL and configure optional advanced directives.

Live Preview
0 bytes  /  512 KB limit  •  0 lines
Learn More

Robots.txt Explained

Everything you need to know about robots.txt in 2026, from basic directives to AI crawler controls.

What is robots.txt?

A plain-text file placed at the root of your domain (yoursite.com/robots.txt) that tells crawlers which pages they can or cannot access. Crawlers check this file before visiting any page on your site.

  • User-agent: specifies which bot the rules apply to (* = all bots)
  • Disallow: tells bots not to crawl a path
  • Allow: overrides a Disallow for a sub-path (Googlebot only)
  • Sitemap: points crawlers to your XML sitemap

AI Crawlers in 2026

AI crawlers have exploded in 2026. GPTBot traffic grew 305% year-over-year. Each major AI company now runs three separate bots: one for training, one for user-initiated fetches, and one for search indexing. This lets you allow AI search visibility while blocking training data scraping.

Note: Perplexity has documented compliance issues using undeclared user-agent strings. robots.txt alone may not fully stop it.

Common Mistakes

  • Blocking CSS and JS files (prevents proper rendering by Google)
  • Using Disallow: / for all bots and forgetting to re-allow search engines
  • Relying on robots.txt to protect sensitive content (it is advisory only)
  • Exceeding the 500 KB file size limit (Google silently ignores the rest)
  • Using Crawl-delay expecting Google to respect it (it does not)

robots.txt vs llms.txt

robots.txt controls whether bots can crawl and index your pages. It is the established standard, respected by all major search engines and AI crawlers.

llms.txt is an emerging (not yet standardized) Markdown file at yoursite.com/llms.txt that guides AI assistants on how to interpret, summarize, and cite your content. Claude (Anthropic) officially supports it. Google does not yet.

FAQ

Frequently Asked Questions

A robots.txt file instructs web crawlers which pages or sections of your site to avoid. Without one, all bots can crawl everything. A well-configured robots.txt protects sensitive admin pages, prevents duplicate content issues from search parameters, and conserves your crawl budget for the pages that matter.
Use the AI Crawlers section above. Each bot can be individually allowed, blocked, or left unspecified. Blocking adds a User-agent: GPTBot followed by Disallow: /. OpenAI, Anthropic, and Google all formally respect robots.txt. The "Block Training Only" preset allows search/answer bots while blocking training crawlers.
For most sites, no. Google has ignored Crawl-delay for years. Bing and Yandex still support it. Modern bots automatically detect server stress via HTTP 429 responses. If you have a genuinely resource-limited server, use Google Search Console to set crawl rate limits directly instead.
Google enforces a hard limit of 500 KiB (512,000 bytes). Any content past this point is silently ignored, with no warning or error. This generator shows your file size in real time so you can stay well under the limit. Most sites should never get close to it.
llms.txt is a Markdown file at yoursite.com/llms.txt that provides guidance to AI assistants about how to interpret your content. It is not yet a formal web standard and Google does not support it. Anthropic (Claude) officially endorses it. If AI visibility matters for your site, it is worth adding alongside your robots.txt.
Select "WordPress" from the Platform/CMS dropdown above. The tool will pre-check the standard WordPress paths: /wp-admin/ (with /wp-admin/admin-ajax.php auto-allowed), /wp-includes/, /wp-json/, /xmlrpc.php, /trackback/, and /feed/. You can toggle each individually and add custom paths.
robots.txt is advisory, not a security mechanism. Legitimate crawlers (Google, Bing, major AI companies) respect it. Malicious scrapers and bad actors may not. For genuinely sensitive content, use password protection, server-level access controls, or authentication. Never rely on robots.txt alone to protect private data.
Yes. Use the "Import Existing File" section at the top of the form. Drag and drop your robots.txt or click to browse. The tool validates the file client-side (only .txt files up to 512 KB are accepted, content is checked for valid directives), then parses it and populates all the form fields automatically. Your file is never sent to any server.