KDnuggets Home » News » 2019 » Mar » Opinions » The Difference Between Data Scientists and Data Engineers ( 19:n10 )

The Difference Between Data Scientists and Data Engineers


ODSC East 2019 has multiple tracks for both Data Scientists and Data Engineers, including workshops, talks, and training sessions. Save 45% with code KDN45.



By ODSC. Sponsored Post.

Odsc East 2019 Learn

As the field of machine intelligence continues to expand, new roles are being created and existing ones are expanding. Many people don’t have a clear understanding of the difference between data scientists and data engineers. The articles addressed the specific skill sets required for these two distinct career paths.

Here are some of the core competencies of data scientists and data engineers along with overlapping areas:

Data scientists – mathematics & statistics, computer science, machine learning plus AI/deep learning, advanced analytics, and data storytelling

Data engineers – production-level programming, distributed systems, data transformation, data analytics, and data pipelines.

Overlapping – data analytics and programming

Let’s dive down into these areas to better understand the differentiators:

Skills for Data Scientists

Data scientists generally come from an applied mathematics and/or statistics background coupled with computer science. Machine learning is based on the mathematical foundations of statistical learning. Trying to excel in data science without mathematics knowledge will lead to an incomplete perspective.

Data scientists will also need to interact with business domain experts in order to cultivate the desired insights. Data scientists also need to analyze data (exploratory data analysis) to help the business utilize their data assets.

Data scientists will also have the background to choose appropriate machine learning algorithms, train them, and devise methods for testing their accuracy.

Additionally, data scientists must be well-versed in the art of data storytelling when the results of a data science project need to be conveyed to the business stakeholders in an understandable fashion. This effort requires the ability to verbally and visually communicate complex results and observations in a way that the stakeholder can understand and act on them.

Data scientists will have also developed coding skills out of necessity, most settling on either the R or Python language environments. The programming skills of a data scientist aren’t typically at the level that you’d see for a data engineer – nor should they be!

Skills for Data Engineers

Data engineers come from a programming background, possibly as a result of a computer science degree. Their background is generally in languages like Python, Java, or Scala. Their emphasis is with distributed systems and big data. Compared to data scientists, their programming skills are more advanced and specifically suited for building high-availability production systems.

Using these programming skills, data engineers create data pipelines at scale. This involves integrating a number of big data technologies. Data engineers are tasked with deciding which tools are right for the job. Data engineers also have an in-depth understanding of data technologies and frameworks and how to integrate them with data pipelines. Further, data engineers work closely with personnel responsible for clusters, DevOps, and DataOps.

Data engineers also implement machine learning algorithms chosen by data scientists for a production environment. For example, this may involve deploying a classification algorithm used by the data scientist in R to a more robust production platform.

Overlapping Skills

Certainly, there are overlapping skills with respect to programming, although a data engineer’s programming skills often outweigh those of a data scientist. For example, having a data scientist program a production data pipeline may be an overreach, whereas this kind of task is directly in the wheelhouse of a data engineer. Here, the skills are complementary since the data scientist may design the data pipeline and the data engineer will program and maintain it. A data scientist should generally not be expected to program data pipelines.

Another area of overlap is with data analytics. The data scientist’s analytics skills are usually much more evolved than the analytic skills of a data engineer. Data engineers may be able to do some basic analytics but would not be able to address the needs of more advanced analytics that a data scientist would easily do.

Misalignments in the Enterprise

Many enterprises make mistakes with respect to aligning the above skillsets with the actual job title. First and foremost, don’t fall into the rabbit hole of trying to find one person, known as a unicorn, who can do the job of both data scientist and data engineer. Sure, there may be a few unicorns out there, but they’re in very high demand and command a very high salary. Plus, what happens if you hire a unicorn and they decide to leave?

Another mistake is having data scientists do the work of a data engineer. Creating a data pipeline is not easy and it requires advanced knowledge of production programming frameworks. A data scientist may be able to acquire these skills, but this is not the most efficient use of this resource. Data scientists are not engineers who build production systems, create data pipelines, and expose machine learning results.

On the flip side, it is a mistake having data engineers do the work of a data scientist, although this is far less common. Some data engineers work to widen their skills by improving their mathematics and statistics knowledge, and correspondingly their machine learning skills. This career path sometimes results in yet another job category, the “machine learning engineer.”

Machine learning engineers typically come from data engineering backgrounds, but they’ve become proficient at certain aspects of data science and sit on the fence between data science and data engineering. This category really isn’t a unicorn, but rather a data engineer who understands how to operationalize and optimize machine learning. Machine learning engineers take what a data scientist creates and makes it production ready.

How do I advance my skills?

At ODSC East 2019, we have entire focus areas spanning multiple tracks around both these areas.  ODSC workshops, talks, and training sessions are idea for either or both types of data science professionals, whether you’re a data scientist or a data engineer! Here are a few standout sessions you may want to review pick from our data science and data engineering focus areas:

Data Scientist:

Programming with Data: Python and Pandas Causal Inference for Data Scientists
Achieving Salesforce-Scale Machine Learning in Production Intermediate RMarkdown in Shiny
Modeling in the tidyverse Tensorflow 2.0 and Keras: what's new, what's shared, what's different
Imitation Learning - Reinforcement Learning for the Real World When the Bootstrap Breaks
Building Recommendation Engines and Deep Learning Models Using Python, R and SAS Scaling AI Applications with Ray

 

Data Engineer

Programming with Data: Python and Pandas Engineering For Data Science
Achieving Salesforce-Scale Machine Learning in Production Reproducible Data Science Using Orbyter
Modeling in the tidyverse Real-ish Time Predictive Analytics with Spark Structured Streaming
Imitation Learning - Reinforcement Learning for the Real World Making Data Science: AIG, Amazon, Albertsons
Building Recommendation Engines and Deep Learning Models Using Python, R and SAS Visual Search on Hayneedle

Conclusion

In summary, it is important to realize how data scientists and data engineers complement one another. Talented data science teams consist of both skillsets. It is a waste of good resources to have a data scientist doing the job of a data engineer and vice versa. It is highly improbable that you will be able to find a unicorn – one person who is both a skilled data engineer and an expert data scientist. Therefore, you will need to build a team, where each member complements the other’s skills and is able to work well together.

Ready to learn skills for both data scientists and data engineers? Attend ODSC East and Save 45% with Code KDN45 - April 30-May 3 in Boston.


Sign Up