Data at Monzo has grown a lot in the last couple of years, not only in the number of people, but also in the number of data assets that we maintain. At the time of writing, we have over 4700 data models in our production dbt project, and over 800 views defined in Looker 🤯.
This acceleration has become challenging for u s, creating some growing pains that we’ve previously discussed. In this post, I’d like to give you a flavour of the things we’re exploring within the Analytics Engineering team to keep our warehouse healthy and up to date. Specifically, we’ll talk about how we’re exploring ways to answer questions like:
Tools like dbt lineage, spectacles.dev, Datahub, Google Data Catalog… can help us answer these questions from different perspectives, but we are a little bit more eager than just having the lineage at model level. We want to go one step further and try to get information about how our columns evolve through our models and understand which data ends up in the hands of end-users.
In this post we’ll cover how we’re testing a data lineage solution at column level using audit logs and ZetaSQL to help us become Data Cartographers, mapping out the different roads data can take within our warehouse.