Conventional wisdom in semiconductor manufacturing has long held that bigger chips mean worse yields. Yet at Cerebras, we’ve successfully built

100x Defect Tolerance: How Cerebras Solved the Yield Problem

submited by
Style Pass
Wed, Jan 15, 2025 21:30

Conventional wisdom in semiconductor manufacturing has long held that bigger chips mean worse yields. Yet at Cerebras, we’ve successfully built and commercialized a chip 50x larger than the largest computer chips – and achieved comparable yields. This seeming paradox is one of our most frequently asked questions: how do we achieve a usable yield with a wafer-scale processor?

The answer lies in rethinking the relationship between chip size and fault tolerance. This article will provide a detailed, apples-to-apples comparison of manufacturing yields between the Cerebras Wafer Scale Engine and an H100-sized chip, both manufactured at 5nm. By examining the interplay between defect rates, core size, and fault tolerance, we’ll show how we achieve wafer scale integration with equal or better yields vs. reticle limited GPUs.

Like any manufacturing process, computer chips are prone to defects. Larger chips are more likely to encounter defects, thus as chips grow in size, yields fall exponentially with increasing die area. Even though larger chips generally run faster, early microprocessors were built to a modest size to maintain acceptable manufacturing yields and profit margins. In the early 2000s, this started to change. As transistor budgets grew to over 100 million, it became the norm to build processors with multiple independent cores per chip. Since all the cores were identical and independent, chip designers built-in core-level fault tolerance so that if one core suffered a defect, the remaining cores could still operate. For example in 2006 Intel released the Intel Core Duo – a chip with two CPU cores. If one core was faulty, it was disabled and the product was sold as an Intel Core Solo. Nvidia, AMD, and others all embraced this core-level redundancy in the coming years.

Leave a Comment