You don’t need a crystal ball to see that the data lakehouse is the future. At some point soon, it will be the default way of interacting with data, combining scale with cost-effectiveness.
Companies operating data silos will have the most difficulty in moving to a lakehouse architecture. Transitioning while keeping data partitioned into isolated silos results in more of a swamp than a lakehouse, with no easy way to get insights. The alternative is to invest early in rearchitecting the data structure so that all the lakehouse data is easily accessible for whatever purpose a company wants.
I believe the best approach for a data lakehouse architecture, both now and in the future and no matter how much scale is required, is to choose an open source route. Let me explain why.
The transition to data lakehouses is being driven by a number of factors, including their ability to handle massive volumes of data, both structured and — more importantly — unstructured.