ByteDance looks like it’s eager to make up for lost time when it comes to scraping the web for data needed to train its generative AI models. The Ch

TikTok’s parent launched a web scraper that’s gobbling up the world’s online data 25-times faster than OpenAI

submited by
Style Pass
2024-10-07 11:00:04

ByteDance looks like it’s eager to make up for lost time when it comes to scraping the web for data needed to train its generative AI models.

The China-based parent company of video app TikTok released its own web crawler or scraper bot, dubbed Bytespider, sometime in April, according to research from Kasada, a company that specializes in bot management for companies with online data. The existence of the bot was also confirmed by Dark Visitors, which monitors scraper bots.

ByteDance’s bot has quickly become one of the most, if not the single most, aggressive scrapers on the internet, the research shows. It’s scraping data at a rate that’s many multiples of other major companies, such as (Google, Meta, Amazon, OpenAI, and Anthropic, which use their own scraper bots to help create and improve their large language or multimodal models, known as LLMs or LMMs.

Sam Crowther, the CEO of Kasada, said since Bytespider showed up, it’s been scraping data at about 25 times the rate of GPTbot, which scrapes data for OpenAI’s ChatGPT platform and underlying models, for instance. Bytespider has been scraping at 3,000 times the rate of ClaudeBot, from Anthropic, which operates the Claude platform.

Leave a Comment