In Chapter 11 of my book Analyzing US Census Data, I explore a sampling of the variety of government datasets that are available for the United States

Analyzing labor markets in Python with LODES data

submited by
Style Pass
2023-01-26 00:30:33

In Chapter 11 of my book Analyzing US Census Data, I explore a sampling of the variety of government datasets that are available for the United States. One of the most useful of these datasets is LODES (LEHD Origin-Destination Employment Statistics). LODES is a synthetic dataset that represents, down to the Census block level, job counts by workplace and residence as well as the flows between them.

Given that LODES data are tabulated at the Census block level, analysts will often want to merge the data to Census geographic data like what is accessible in the pygris package. pygris includes a function, get_lodes(), that is modeled after the excellent lehdr R package by Jamaal Green, Dillon Mahmoudi, and Liming Wang.

This post will illustrate how to analyze the origins of commuters to the Census tract containing Apple’s headquarters in Cupertino, CA. In doing so, I’ll highlight some of the data wrangling utilities in pandas that allow for the use of method chaining, and show how to merge data to pygris shapes for mapping. The corresponding section in Analyzing US Census Data to this post is “Analyzing labor markets with lehdr.”

To get started, let’s import the functions and modules we need and give get_lodes() a try. get_lodes() requires specifying a state (as state abbreviation) and year; we are getting data for California in 2019, the most recent year currently available. The argument lodes_type = "od" tells pygris to get origin-destination flows data, and cache = True will download the dataset (which is nearly 100MB) to a local cache directory for faster use in the future.

Leave a Comment