Google technology used to be years ahead of the industry. In 2004, Google released a paper on its proprietary MapReduce algorithm, the system that pow

Diseconomies of Scale at Google

submited by
Style Pass
2021-06-14 15:00:05

Google technology used to be years ahead of the industry. In 2004, Google released a paper on its proprietary MapReduce algorithm, the system that powered Google's massively parallel web crawling infrastructure. The previous year, Google had released a paper on its proprietary Google File System, which worked hand-in-hand with MapReduce. No other company was operating at Google scale.

But the industry always catches up, eventually. In 2006, two engineers would use those papers as a blueprint to create an open-source version of both technologies, Apache Hadoop and HDFS. They quickly became the industry standard - spawning huge companies like Cloudera, Hortonworks, and Databricks. Google's internal implementation was similar but incompatible. Not only had Google failed to commercialize the technology, but it now maintained a completely different codebase. This made it difficult to hire talent, expensive to keep up with improvements, and created a divergent basis for future innovation.

Avoiding the MapReduce/Hadoop situation was the initial rationale for open-source projects like TensorFlow and Kubernetes. While open-sourcing internal Google technologies has been wildly successful in both cases, Google is still filled with bespoke proprietary technology. Everything works differently at Google: building software, communicating between services, version control, code search, running jobs and applications, testing, and everything in between. Ramp-up time for new engineers continues to increase. Engineers aren't able to use off-the-shelf software because it won't work with internal technologies. Technologies that were years ahead are now only months.

Leave a Comment