In the last year, the data scientist has been called “the sexiest job of the 21st century.” But if data is the new oil, and data scientists are its petrochemical high priests, who are the oil riggers? Who are the roughnecks doing the dirty work to get data pipelines flowing, unpacking bytes, transforming formats, loading databases?
They are the data engineers, and their brawny skills are more critical than ever. As the era of Big Data pivots from research to development, from theoretical blueprints to concrete infrastructure, the notional demand for data science is being dwarfed by the true need for data engineering.
A stark but recurring reality in the business world is this: when it comes to working with data, statistics and mathematics are rarely the rate-limiting elements in moving the needle of value. Most firms’ unwashed masses of data sit far lower on Maslow’s hierarchy at the level of basic nurture and shelter. What is needed for this data isn’t philosophy, religion, or science – what’s needed is basic, scalable infrastructure.
It’s the data engineers who can build this infrastructure, and they represent the true talent shortage of Silicon Valley and beyond. Their unsexy but critical skills include crafting Hadoop pipelines, programming of job schedulers, and parsing broad classes of data – timestamps, currencies, lat & long coordinates – which are the screws, bolts, and ball bearings in the industrial age of data.