The Complete Data Engineering Study Roadmap
Everything you need to know to start your career in Data Engineering.
Image by Author
The Complete Data Science Study Roadmap seemed to be popular, so I thought it would be a good idea to do an edition. In this article, I will go through all you need to become a Data Engineer.
1. Build your Foundation
There are so many intricacies of becoming a Data Engineer, and it can become a bit overwhelming at times. But the only thing that will keep you grounded on the roadmap is building a solid foundation.
The basis of your foundation will include becoming proficient in one or two programming languages, SQL, and more about servers.
Python
If you chose Python as your programming language, here are some recommended courses:
- 100 Days of Code: The Complete Python Pro Bootcamp for 2022 - Udemy
- Programming for Everybody (Getting Started with Python) - Coursera (University of Michigan)
SQL
- The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert - Udemy
- Complete SQL Mastery - CodeWithMosh
The Essentials
Data Engineering Essentials using SQL, Python, and PySpark - Udemy
2. Mathematics and Statistics
Just like any other career that involves the use of data analytics and the engineering of it - Maths is always needed. It will allow you to understand your day-to-day tasks much better as well as be able to apply your skills more effectively.
Here are some other resources to help you:
- Statistics Fundamental by Josh Starmer - YouTube
- Mathematical Foundations of Machine Learning - Udemy
- Statistics for Data Science and Business Analysis - Udemy
2. Database Management Systems
As a Data Engineer, you will be working with Database Management Systems a lot - as they assist in handling large datasets. There are a lot of Database Management Systems out there, so don’t feel the pressure of needing to know all of them. That depends on the company you work for, or what you prefer working with.
- Principles of Database Management - YouTube
- Database Management System (DBMS) & SQL : Complete Pack 2022 - Udemy
- Introduction of DBMS - Article/Documentation
- Top 30 Most Popular Database Management Software - Blog
If you would also like to know more about a FREE course about SQL & Database, have a read of this: Free SQL and Database Course
4. Data Warehousing and Data Pipelines
This area of focus is what differentiates Data Engineers from Data Scientists. Both learn the same fundamentals and use the same programming languages, SQL, etc. But data warehousing and data pipelines are what sets Data Engineers aside - making them good Data Engineers.
The resource I would recommend for Data Warehouse are:
- The Data Warehouse Toolkit - PDF Book. This book was written by one of the people who built a part of the foundations for data warehouses - Ralph Kimball.
- Data Warehousing Tutorial - Articles
- Database vs Data Warehouse vs Data Lake - YouTube
Below are some resources to learn about Data Pipelines:
- Data Pipelines Explained - YouTube
- ETL vs ELT - Article
- Building Data Engineering Pipelines in Python - DataCamp
5. Cloud Computing
Last but not least, Cloud Computing. You won't need to know everything, but you should have a decent understanding of different providers, their capabilities, limitations, etc.
You will need to know the basics of cloud computing, such as IAAS, PAAS, and SAAS as well as the architecture of cloud computing.
Here are some resources on cloud computing:
- Cloud Computing Tutorials and Resources
- Cloud Data Engineering - Coursera
- Cloud Courses - cloudacacemy
6. Analytics Engineering
Analytics engineering is also important to learn. It consists of:
- ETL (Extract Transform and Load)
- Creating data models (dbt model)
- Testing and documenting
- Deployment to the cloud and locally
- Visualizing the data with analytical application (google data studio and metabase)
You can learn all of these concepts through the DataTalksClub YouTube playlist.
Here are some additional resources to help you:
dbt Free Courses - dbt
Analytics Engineering Bootcamp - Udemy
Learn DBT from Scratch - Udemy
7. Projects
It seems like that’s a lot of learning - it is. That’s why it is imperative that you feel proficient in each of those areas to be a successful Data Engineer. You can do this stage during your learning or after - it is up to you. Some people prefer to apply their knowledge and skill after all the learning, some prefer to do it during, in order to test themselves.
So the next stage is applying your code and putting your skills to the test. Your project list should aim to hit all of these areas:
- Explore different types of Data Formats
- Data Warehousing
- Data Analytics
- Data Sourcing
- Big Data Tools
Ideas for Data Engineering projects
- Data Engineering Zoomcamp - real-world project
- Scrape Stock and Twitter Data Using Python, Kafka, and Spark
- Web-scraping with real-estates
- Building A Data Platform
- Snowflake Real-Time Data Warehouse
Out of Data Engineering, you can practice your coding skills with LeetCode challenges, however, this can be applied to the majority of tech careers.
8. Interview Preparation
The moment that all of you have been waiting for but are sweating bricks about - the interview. There is a lot of content to remember, so preparing yourself is the best thing you can do for yourself.
Here are some resources to help you:
If Python is your chosen programming language, it would be good to internalize the Google Python Style Guide
Let's not forget about the soft skills: 73 Questions to Ask Employees During an Interview
Further Reading
If you would like to continue studying (which a lot of people advise), here is a list of books that are Essential for you to Become a Data Engineer.
If you are looking for the ultimate course on Data Engineering, I would recommend this: Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certificate
Your journey to becoming a Data Engineer won't be easy. You will need to put in the work, but I promise you once you do it will pay off.
Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.