The internet, in many ways, has a memory. From archived versions of old websites to search engine caches, there’s often a way to dig into the past a

27.6% of the Top 10 Million Sites are Dead

submited by
Style Pass
2024-10-30 13:30:12

The internet, in many ways, has a memory. From archived versions of old websites to search engine caches, there’s often a way to dig into the past and uncover information — even for websites that are no longer active. You may have heard of the Internet Archive, a popular tool for exploring the history of the web, which has experienced outages lately due to hacks and other challenges. But what if there was no Internet Archive? Does the internet still “remember” these sites?

In this article, we’ll dive into a study of the top 10 million domains and reveal a surprising finding: over a quarter of them — 27.6% — are effectively dead. Below, I’ll walk you through the steps and infrastructure involved in analyzing these domains, along with the system requirements, code snippets, and statistical results of this research.

Thanks to resources like DomCop, we can access a list of the top 10 million domains, which serves as our starting point. Processing such a large volume of URLs requires significant computing resources, parallel processing, and optimized handling of HTTP requests.

Leave a Comment