Survival modeling tools often require target variables to be specified in duration table format. A duration table lists each unit of a study (e.g. a w

How to build duration tables from event logs with SQL

submited by
Style Pass
2021-06-09 18:30:10

Survival modeling tools often require target variables to be specified in duration table format. A duration table lists each unit of a study (e.g. a website user), whether or not they experienced some outcome of interest (e.g. a purchase), and how long it took for that outcome to happen, if known.

Although duration tables are relatively straightforward conceptually, they are a bit tricky to construct from raw data. In my last article, I showed how to build a duration table in Python from an event log, using the example of web browser activity. Check out the first half of that article for a more detailed explanation of duration tables and event logs.

The thing is, event logs are usually stored in a database or data warehouse that we query with SQL, not Python. So in this article, I show how to build a duration table from an event log in SQL.1

The following code—including commands to set up the event log—can be found in The Crosstab Kite's gists repo, in the file events_to_durations.sql. In many of the following snippets, I implicitly include a LIMIT clause to display the output.

Leave a Comment