In early 2021, our team of long-time data geeks started on the development of an open analytics taxonomy. The goal: to come up with a generic way to structure analytics data, so models built on one data set can be deployed and run on another.
We used to spend our days building models and running in-depth user behavior analyses on raw datasets for enterprise clients. Those datasets were mostly collected with popular analytics tools like Google Analytics, Mixpanel and Adobe Analytics.
Having done this for 50+ clients, one thing in particular started to stand out: most clients had very similar analytics goals, but their data sets all looked different.
They all wanted to prevent churn, increase engagement or conversion, personalize user experiences, and predict behavior, but every in-house team had made up their own event types, naming conventions and ways to structure data. As a result, nothing could be reused. Pipelines and models all needed to be built from scratch, and significant time was spent to get data into a clean, model-ready state.
At this point in time, we became intrigued with the idea of developing an open analytics taxonomy. If goals are so similar, isn’t there a generic way we can structure analytics data so that all these common analytics use cases are covered? And what would that mean for more specific needs? And, if possible at all, what would that look like?