A Lifetime Under the Hood - From 6502 Machine Language to Fabric Parquet Files

submited by
Style Pass
2024-10-09 05:00:02

One of the most life-altering events in my life was receiving an Apple II+ as a gift when I was a teenager. That gift very likely led me into my current career path of being a data engineer.

The timing of the gift, 1981, was pre-web so there were very limited ways to learn more about computing --- books, magazines, word-of-mouth. For this reason, I often hung out at the local computer store looking for information. One of the first books I purchased focused on 6502 Machine Language. My interest in this was partly curiosity but, more likely, a desire to speed up games that I had written in the BASIC language. Given my lack of knowledge, I didn’t realize that a higher-level “assembly” language existed. My punishment for this lack of knowledge was having to write out on paper the individual hex codes that I needed and then to enter those into the computer.

So what does this story have to do with the open standard “Apache Parquet” file format used as the basis for many analytics platforms, including Microsoft Fabric? Well, I think it’s important for software / data engineers to balance the level of abstraction in which they’re working with some level of intuition as to how something works. In other words, you absolutely want to work with higher levels of abstraction and utilize lower-level libraries to boost your productivity. However, sometimes looking “under the hood” provides invaluable intuition as to how things work. I was recently struggling to understand how Parquet files worked by simply looking at specification documentation so I decided to take a deeper dive.

Leave a Comment