Pandas Functions #1: Advanced Groupbys with Grouper, Assign, and Query 

submited by
Style Pass
2021-07-26 19:30:07

Pandas groupbys are some of the most useful functions in a data scientist's toolkit. And yet, time and again I have found that colleagues do not realize the flexibility these ubiquitous functions can grant them. In the following post, I will demonstrate some of my favorite uses of groupbys in the hope that it will help others in the future.

Let's start with a simple example and work our way up in difficulty. We'll start with reading in Covid19 data that Johns Hopkins University aggregated from the World Health Organization. The dataset contains approximately 300,000 observations from different countries and regions of the world on Covid19 cases, recoveries, and deaths.

Here we can confirm that we have seven columns of regional and national-level disease data. ObservationDate and Last Update don't differ by much, so for our purposes we'll simply stick with Observation Date, dropping the other. 

Being Covid timeseries data, the first thing we can do is check how the disease has progressed over the weeks. Enter pandas.Grouper, a groupby-specific function that allows users to control how their data will be grouped at a time-based level. All we have to do is invoke Grouper within a typical groupby function, provide an offset alias (e.g. D for daily, W for weekly, Y for yearly), and an aggregation metric (e.g. sum, mean, count) such as is done in the following lines of code:

Leave a Comment