Have you ever created a Jupyter notebook and wished you could generate the notebook with a different set of parameters? If so, you’ve probably d

Parameterizing and automating Jupyter notebooks with papermill

submited by
Style Pass
2021-07-08 16:00:10

Have you ever created a Jupyter notebook and wished you could generate the notebook with a different set of parameters? If so, you’ve probably done at least one of the following:

It turns out that there is a good solution for this problem that parameterizes interactive notebooks and coexists well with automated jobs, it’s called papermill.

Many notebook authors use the standard practice of designating a cell near the top of their notebooks for global variables. The author or other users of the notebook then modifies the values in the cell and runs the entire notebook to obtain different results. To persist the output, the author will manually download the notebook in another format or save it as a different notebook file. But using only a notebook server and these manual methods can quickly become messy and difficult to track, not to mention error prone. Which notebook is the one you edit? Papermill helps solve this problem. In this article, I’ll introduce papermill and basic usage, walk through an example of parameterization, and finally talk about ways to fully schedule and automate notebook execution using cron.

With papermill, a special cell in the notebook is designated for parameters. When papermill executes a parameterized notebook, either via the command line interface (CLI) or using the Python API, parameters are passed in and executed in a subsequent cell. This allows the notebook to be run multiple times with different parameters quickly. The resulting executed notebook can then be saved in a variety of places, including local or cloud storage.

Leave a Comment