This article was originally published on June 4, 2021. For clarity and generality, it has been updated to change the terms ‘conversion analysis’,

How to convert event logs to duration tables for survival analysis

submited by
Style Pass
2021-06-07 22:30:03

This article was originally published on June 4, 2021. For clarity and generality, it has been updated to change the terms ‘conversion analysis’, ‘conversion table’, and ‘conversion events’ to ‘survival analysis’, ‘duration table’, and ‘endpoints’, respectively.

In many applications, it is most natural to think about how much time it takes for some outcome to occur. In medicine, for example, we might ask whether a given treatment increases patients' life span. For predictive maintenance, we want to know which pieces of hardware have the shortest time until failure. For customer care, how long until support tickets are resolved?

Survival analysis techniques directly address these kinds of questions, but my sense is that survival modeling is relatively rare in industry data science. One reason might be that setting up the data for survival modeling can be tricky but most survival analysis literature skips the data prep part. I'll try to fill the gap, by showing how to create a duration table from an event log, using the example of web browser events.

Event logs are a type of transactional fact table. They have a long-form schema where each entry represents an event, defined by the unit (e.g. user), timestamp, and type of event.

Leave a Comment