Street addresses are among the more quirky artifacts of human language, yet they are crucial to the increasing number of applications involving maps and location.
Last year I worked on a collaboration with Mapzen with the goal of building smarter, more international geocoders using the vast amounts of local knowledge in open geographic data sets.
The result is libpostal: a multilingual street address parsing/normalization library, written in C, that can handle addresses all over the world.
Libpostal uses machine learning and is informed by tens of millions of real-world addresses from OpenStreetMap. The entire pipeline for training the models is open source.
Since OSM is a dynamic data set with thousands of contributors and the models are retrained periodically, improving them can be as easy as contributing addresses to OSM.
Each country’s addressing system has its own set of conventions and peculiarities and libpostal is designed to deal with practically all of them.