Update 7/30/24: After this story was originally published, an Anthropic spokesperson told 404 Media that CLAUDEBOT will respect block requests for its older two crawlers. “The 'ANTHROPIC-AI' and 'CLAUDE-WEB' user agents are no longer in use,” the spokesperson said. “We have configured ClaudeBot, our centralized user agent, to respect any existing robots.txt directives that were previously set for these deprecated user agents. This attempts to respect website owners' preferences, even if they haven't updated their robots.txt files.” The original text of this story follows below:
Hundreds of websites trying to block the AI company Anthropic from scraping their content are blocking the wrong bots, seemingly because they are copy/pasting outdated instructions to their robots.txt files, and because companies are constantly launching new AI crawler bots with different names that will only be blocked if website owners update their robots.txt.
In particular, these sites are blocking two bots no longer used by the company, while unknowingly leaving Anthropic’s real (and new) scraper bot unblocked.