NamesDataset is a Ruby library (ported from the python philipperemy/name-dataset library) that provides fast lookups and metadata for first and last names. Ever wondered if “Zoe” is more likely a name from the United Kingdom or how popular “White” is as a last name in the United States? This library helps you answer those questions.
Under the hood, NamesDataset loads an in-memory dataset (derived from a Facebook leak of 533M users) that’s roughly 3.2GB once loaded into memory. Once loaded, it’s quick to search but definitely requires some hardware overhead, so keep that in mind if you’re planning on deploying this to production.
Because the library pre-loads the entire 3.2GB dataset into memory, you’ll need sufficient RAM to avoid NoMemoryError. If you only need a subset of the data or if memory is a major concern, consider alternative approaches (e.g., a streaming or database-based solution). But if you can spare the memory, NamesDataset is fast for repeated lookups once it’s loaded.
Thanks for checking out names_dataset! If this library helps you ship something neat, I’d love to know about it, feel free to open a Pull Request or Issue ❤️