To be proficient as a data engineer, you need to know various toolkits—from fundamental Linux commands to different virtual environments and optimizing efficiency as a data engineer.
This article focuses on the building blocks of data engineering work, such as operating systems, development environments, and essential tools. We'll start from the ground up—exploring crucial Linux commands, containerization with Docker, and the development environments that make modern data engineering possible. We look at current programming languages and how they influence our work—providing a comprehensive overview of the tools of a modern data engineer.
Before we start, you don't need to know everything discussed here, but over time, you may use all of them in various roles as a data engineer at different companies. I hope this article will give you a good overview and guidelines on what is essential and what is not.
Again, each selection might differ slightly depending on the company's setup, preferred vendors, and whether it uses a low-code or a building approach. Let's start with the first choice you must make at any company, the operation system to work on.