How can AI help us understand our products and our users better? Code generation and automatic data analysis tools are good starts, but we believe this indirect approach still leaves a lot on the table. The LLMs powering code generation are trained on billions or even trillions of tokens in order to develop a deep understanding of the semantics of sequences of tokens. Product usage data also consists of large collections of sequences of user event tokens, making it a good fit for deep learning architectures similar to those used for natural language and code.
At Motif, we are applying breakthroughs in language modeling to train foundation models of event sequences in order to surface more useful and important insights for decision makers. In this post we’ll explain how we do it, why we are doing it, and how we are applying these foundation models in practice.
Product event sequences are rich data sets generated by websites, apps, and backend infrastructure as they operate to serve user-facing requests. Product instrumentation is organized around capturing a log of immutable events, typically user identifiers, timestamps, and a variety of additional properties about the event.