Flattening nested JSON or MongoDB’s BSON or normalizing semi-structured data and writing queries on it for analytics or regular queries, is a common challenge in data processing.
In this article, we’ll explore multiple methods for extracting data from semi-structured JSON into a SQL readable table format, including techniques that use Spark to infer the schema automatically, manually and more.
We’ll also cover various other ways to flatten a semi-structured JSON data including recursive flattening, full flattening into the main table, creating separate tables for array keys.
For illustration purposes we have taken an example JSON that has 2 levels of nesting. The JSON contains a field that can either be an integer, string or an array of integers or string or objects. We want to "explode" the nested data into a tables format.
contact: { email: '[email protected]', phone: '123-456-7890' },