The Ultimate Guide to Data Engineer Interviews
If you are preparing for data engineering interviews, then follow these technical recommendations regarding your resume, programming skills, SQL acumen, and system design problem-solving, as well as the non-technical aspects of your upcoming interview session.
By Xinran Waibel, Data Engineer at Netflix.
Although data engineer (DE) was the fastest-growing tech job role in 2019, there aren’t many online resources on what to expect in a data engineering interview and how to prepare for it.
In the past year, I have interviewed for data engineer roles with several tech companies in the Bay Area and helped many connections succeed in their interviews. In this blog post, I will explain the most important technical topics in data engineering interviews: your resume, programming, SQL, and system design. I will also teach you how to prepare for the non-technical part of the interview, which I believe is key to a successful job interview but is often ignored by candidates.
However, I will NOT discuss specific questions asked in any company’s DE interviews, as this blog post is intended to be a general guide to help you understand the essential skills you need to be a successful data engineer.
Your resume is not only the stepping stone to get noticed by recruiters and hiring managers, but also the most important list of projects that you should be ready to discuss in-depth with the interviewers in order to demonstrate your skills, including technical competency, problem-solving, teamwork, communication, and project management.
The most common mistake I have seen in resume deep dive sessions is focusing solely on the technical implementation details without explaining or understanding the trade-offs made in system design (e.g., “I used Kafka because my manager told me so”) and the bigger picture of the project. Keep in mind the interviewers don’t know your previous company’s business problems, and data infrastructure like you do, so you need to provide enough context to help them understand the technical complexity and the impact of your projects. Therefore, the key to a great project deep dive is to paint a full picture of your project from end to end, like a story!
I strongly recommend practicing talking through your most significant data projects (to someone with an engineering background, if possible) and making sure to answer these questions in your story:
- What was the motivation for the project? (i.e., What data/business problem did your project try to solve?)
- What teams did you collaborate with? How did you work with them?
- If you were the project owner, how did you plan and drive it?
- What are the technical trade-offs made in the system design? (i.e., Why did you use framework X instead of other alternatives?)
- What were some of the technical statistics related to your project? (e.g., What is the throughput and latency of your data pipeline?)
- What was the impact of the project? (e.g., How much revenue did it generate? How many people used your application?)
- What were the challenges you had? How did you solve them?
Numbers are important in telling a great project story. Instead of just saying, “it processed a lot of data…”, look up some statistics of your project and include them on your resume. Numbers will showcase the scale, impact, and your deep understanding of the project. They also make your project more believable. (In fact, interviewers might find it suspicious if you can’t even tell how much data your applications can process.)
Alright. Here comes the most unpleasant part of all software engineering interviews: the coding interview, where you are asked to implement complex algorithms (that you will probably never need at work) using the most efficient data structures in the fewest lines of code possible and explain the time and space complexity of your code, all within 30 minutes.
The coding interview for data engineer roles is usually lighter on the algorithm side but heavier on the data side and the interview questions are usually more practical. For instance, write a function to transform the input data and produce the desired output data. You will still be expected to use the most optimal data structures and algorithms possible and gracefully handle all the potential data issues. Since data engineers don’t just use the built-in libraries to process data in the real world, the coding interview might also require you to implement solutions using popular open-sourced libraries, such as Spark and pandas. You are generally allowed to look up documentation during the interview if needed. If the job requires proficiency in specific frameworks, be prepared to use those frameworks in your coding interviews.
Coding in an interview is much harder than coding at work because you will be under the pressure of producing your best lines of code within a very short time. (I know how scary it feels when your mind just goes blank during an interview.) I highly recommend practicing some (but not too many) coding questions on programming websites like LeetCode or HackerRank and get yourself comfortable with writing code on a CoderPad.
What programming languages and frameworks do you need to learn for data engineering roles? Check out this blog post.
SQL, SQL, SQL
SQL is such a critical skill for data engineers that I need a separate section for it (plus SQL is not really a programming language). In fact, it is very common to have a SQL interview in addition to a coding interview. As data engineers are responsible for building reliable and scalable data processing and data modeling solutions, you should be better at SQL than data analysts and data scientists (who mainly use SQL to query production-ready data) so you need to know much more than just “SELECT…FROM…”.
“What? Isn’t SQL just a query language? What else should I know about SQL?”
First of all, SQL is beyond just a query language. It is also a data processing pattern shared by many big data frameworks, such as SparkSQL, pandas, KafkaSQL, etc. Therefore, proficiency in SQL also indicates you can efficiently learn and work with these frameworks too.
A good data engineer should be capable of translating complicated business questions into SQL queries and data models with good performance. In order to write efficient queries that process as little data as possible, you need to understand how the query engine and optimizer work. For example, sometimes using CASE statements combined with aggregation functions can replace JOIN and UNION and process much less data.
Data models have a large impact on how to structure your queries. For instance, always take advantage of table partitions and indexes whenever possible. But data models are also largely dependent on query patterns. To design a good data model, you need to be able to translate business questions into SQL queries that end-users will run against your tables. This is why SQL and data modeling interviews often go side by side. (I will talk more about data modeling in the next section.)
How do you prepare for a SQL coding interview? Check out this blog post.
System design is the most important and most difficult part of data engineering technical interviews. In a system design interview, you will design a data solution from end to end, which is usually composed of three parts: data storage, data processing, and data modeling.
The initial interview question is often very short and abstract (e.g., design a data warehouse from end to end), and it’s your job to ask follow-up questions to pin down the requirements and use cases, just like solving a real-life data problem. The main challenge of system design is to choose the best combination of data storage systems and data processing frameworks based on those requirements and use cases, and sometimes there is more than one optimal solution. The key to nailing a system design interview is to understand key principles and concepts in data engineering and trade-offs of a variety of data systems and frameworks. Designing Data-Intensive Applications is a must-read if you want to build a solid foundation in data system design.
Data modeling is usually the end piece of a system design interview, but sometimes it is a part of the SQL interview instead. One example of a data modeling interview question is to design the backend analytical tables for a reservation system for vet clinics. The most important principle in data modeling is to design your data models based on use cases and query patterns. Again, it is your responsibility to get clarifications on the requirements and use cases so that you can make better design choices.
If you are interested in learning data modeling in-depth, check out The Data Warehouse Toolkit, the bible for Data Warehousing written by Ralph Kimball.
YOU (aka. The Most Import Part)
Now that we’ve covered all technical topics in data engineering interviews let’s talk about the non-technical part. Interviews are not exams where you just need the right answers to pass, but rather a series of conversations to see if you can learn quickly and work with a team to solve problems together. Therefore, it is very important to be human and be yourself during interviews:
- Be nice. Nobody wants to work with jerks.
- Have conversations. The best interviews are usually like conversations. Ask questions if you want information or feedback.
- Problem-solving, not answers. Just like in real life, you don’t always know the right answer to a problem immediately. It’s more important to show how you would approach the problem instead of only giving an answer.
- Show your passion for data engineering. What do you do outside your work responsibility to be a better data engineer?
While the interviewers are interviewing you, you are also interviewing them. Would you enjoy working with them? Would this team provide opportunities for you to grow? Do you agree with the manager’s view and managing style? Finding a good team is hard so ask your questions wisely.
All The Stress
Interviewing is very stressful. It is an imperfect process where strangers judge your professional competency only based on one hour of interactions with you, and sometimes the interview result is not fair. It is frustrating when you just can’t get any further on interview questions, and you feel like the interviewers are looking down at you. Getting rejected over and over again can be devastating to your self-esteem, and you may start to think you’re not good enough. I have been there too: never hearing back from most job applications and failing all the coding interviews I could get. I thought I would never be an engineer. But I am glad I didn’t give up.
If you are feeling overwhelmed, frustrated, or hopeless because of interviews, I want to let you know that you are not alone. If you get rejected for a job, it is their loss. Be patient with yourself and stay hopeful because things will get better, and you just need to keep trying! Always show up to your interviews with confidence because you are good enough!
Original. Reposted with permission.
Bio: Xinran Waibel is an experienced Data Engineer in the San Francisco Bay Area, currently working at Netflix. She is also a technical writer for Towards Data Science, Google Cloud, The Startup on Medium.
- How Data Professionals Can Add More Variation to Their Resumes
- Introduction to Data Engineering
- How to Rock a Virtual Data Interview