The most complicated and time-consuming parts of building a new database system are usually the edge cases and low-level details. Concurrency control, consistency, handling faults, load balancing, that kind of thing. Almost every mature storage system will have to grapple with all of these problems at one point or another. For example, at a high level, load balancing hot partitions across brokers in Kafka is not that different from load balancing hot shards in MongoDB, but each system ends up re-implementing a custom load-balancing solution instead of focusing on their differentiating value to end-developers.
This is one of the most confusing aspects of the modern data infrastructure industry, why does every new system have to completely rebuild (not even reinvent!) the wheel? Most of them decide to reimplement common processes and components without substantially increasing the value gained from reimplementing them. For instance, many database builders start from scratch when building their own storage and query systems, but often merely emulate existing solutions. These items usually take a massive undertaking just to get basic features working, let alone correct.
Take the OLTP database industry as an example. Ensuring that transactions always execute with "linearizable" or "serializable" isolation is just one of the dozens of incredibly challenging problems that must be solved before even the most basic correctness guarantees can be provided. Other challenges that will have to be solved: fault handling, load shedding, QoS, durability, and load balancing. The list goes on and on, and every new system has to at least make a reasonable attempt at all of them! Some vendors rode the hype wave of NoSQL to avoid providing meaningful guarantees, such as not providing linearizability in a lot of cases, but we think those days are long gone.