Data Engineering #
On this page you will find my often opinionated notes with regards to various Data Engineering related topics. Topics that overlap with Software Engineering are found in the Software Engineering pages.
I like to think about Data Engineering as a discipline that spans four pillars of activity. Personally, I allocate time to each of these pillars balancing my activities between them.
Data Operations #
It’s all about the ongoing, day-to-day activities that keep data pipelines running smoothly. This includes monitoring, incident management, troubleshooting, standard infrastructure changes, and 2nd/3rd level support. While doing these activities we notice patterns and can use our findings to avoid future problems.
Strategic Foundations #
Activities in this pillar entail designing the strategic frameworks and guidelines that shape how we develop, how data is modeled, stored, replicated, and protected, ensuring a sustainable long-term setup. Defining the strategy for documentation, and observability practices also falls into this pillar.
ETL & ELT #
This is about core Data Engineering activities like building and managing the data pipelines, including extraction, ingestion/load, transformation, and reverse ETL, ensuring data is accurate, accessible, and timely for downstream use. This is all about the efficient development fast, reliable and cost effective ETL/ELT pipelines, ensuring data quality & validation.
Data Platform #
Data Platform activities, are all those activities related to the enabling technology stack that underpin all data engineering and analytics activities — cloud infrastructure, orchestration, and the tools that power them. This pillar is a lot about configuring cloud services & infrastructure, cost management/FinOps, and monitoring the technology stack itself.
(Modern) Data Stack #
The no longer so modern Data Stack offers another really nice perspective on the data engineering landscape, with a focus on tooling. I have created my own version of this well known diagram.
The sub pages you will be structured first by the four pillars and the sub structure is inspired by the Modern Data Stack.
To get a good overview of Data Engineering, I can only recommend the Book Fundamentals of Data Engineering: Plan and Build Robust Data Systems 1st Edition by Joe Reis, and Matt Housley.