The Times blocked a bot that had given the Internet Archive’s Wayback Machine huge troves of websites.
The New York Times tried to block a web crawler that was affiliated with the famous Internet Archive, a project whose easy-to-use comparisons of article versions has sometimes led to embarrassment for the newspaper.
In 2021, the New York Times added “ia_archiver” — a bot that, in the past, captured huge numbers of websites for the Internet Archive — to a list that instructs certain crawlers to stay out of its website.
Crawlers are programs that work as automated bots to trawl websites, collecting data and sending it back to a repository, a process known as scraping. Such bots power search engines and the Internet Archive’s Wayback Machine, a service that facilitates the archiving and viewing of historic versions of websites going back to 1996.
The Internet Archive’s Wayback Machine has long been used to compare webpages as they are updated over time, clearly delineating the differences between two iterations of any given page. Several years ago, the archive added a feature called “Changes” that lets users compare two archived versions of a website from different dates or times on a single display. The tool can be used to uncover changes in news stories that have been made without any accompanying editorial notes, so-called stealth edits.