What packages belong together? Learning from R code samples

submited by
Style Pass
2024-06-22 11:00:04

The codesamples package on github contains code snippets from 3 sources: R package examples, github projects, and Stackoverflow questions. It tries to be a reasonably representative source of real R code “in the wild”.

Here, I’m going to look at R packages. What packages often get used together? This is interesting, because potentially it can help people find relevant packages to their use case. I’ll use the Stackoverflow data.

As a first step, I’m going to count each library() call mentioned in our code. I could look for loadNamespace() calls, or fully-qualified function names like dplyr::filter() , but library() is common enough that it will probably do.

The top 5 are not surprising. But {plotly} , {raster} and {plyr} make it to the top 10, which I did not necessarily expect. One possibility is that some packages are harder to use and generate a lot of SO questions. Or, since the data stretches back to 2013, some old packages get mentioned more than you might expect nowadays. Remember {reshape2} ?

Now, we do some data cleaning to convert our data into a co-occurrence matrix, where each row and column represents a package, and the cell counts the number of times the two packages were together. First, we create columns of dummy variables for each snippet.

Leave a Comment