5 Things You Need To Know About Data Science

Here are 5 useful things to know about Data Science, including its relationship to BI, Data Mining, Predictive Analytics, and Machine Learning; Data Scientist job prospects; where to learn Data Science; and which algorithms/methods are used by Data Scientists

I am frequently asked questions about Data Science, so here my answers to some frequent questions and 5 useful things to know about Data Science and Data Scientists.

1. Business Intelligence, Business Analytics, Data Science, Data Analytics, Data Mining, Predictive Analytics - what are the differences?

Business Intelligence or BI is primarily concerned with data analysis and reporting, but does not include predictive modeling, so BI can be considered a subset of Data Science.

The other terms: Business Analytics, Data Analytics, Data Mining, Predictive Analytics are essentially the same as Data Science.

Data Science is concerned with analyzing data and extracting useful knowledge from it. Building predictive models is usually the most important activity for a Data Scientist.

However, because "Data Science" term is relatively new, the name is not commonly accepted yet, and other names are frequently used for the same area.

Data Science can be understood in terms of The Data Science Process which includes business understanding, data understanding, data preparation, modeling, evaluation, and deployment, as described in this CRISP-DM framework:

Fig. 1: CRISP-DM - Data Science Process.

Many universities have recently created degrees in Business Analytics, Data Analytics, or Data Science. Business Analytics, as the name implies, puts more emphasis on business skills and methods, while "Data Science" and "Data Analytics" put more emphasis on data engineering aspects.

Within the scientific community, the most popular name for this field has changed over time
  • Data Mining: first appeared in 1970s, and peaked around 2002, but is still used today
  • KDD (Knowledge Discovery in Data): was used in 1990s, after the start of KDD conferences, but now only used within research community
  • Predictive Analytics: appeared in 2000s, and popularized by Predictive Analytics World, but has not caught with the general public
  • Data Science, 2012-now , fueled by popularity of "Data Scientist" job
This Google Trends chart shows the relative change in popularity of 5 Data Science related terms from 2004 to 2017.

Google Trends Dm Data Science Analytics 2004 2017 704

Fig. 2: Google Trends for Data Mining, Data Science, Data Analytics, Business Analytics, Predictive Analytics, 2004-2017.

2. Data Science vs Machine Learning: What are the differences?

Data Science and Machine Learning can be thought of as close cousins.

What they have in common is supervised learning methods - learning from historical data.

However, Data Science is also concerned with Data Visualization and presenting results in the form understandable to people. Data Science has much bigger focus on Data Preparation and Data Engineering.

Data Science Machine Learning 412

Machine Learning main focus is on the learning algorithms - it is not concerned, for example, with data visualization. Machine Learning studies not only learning from historical data, but also learning in real-time. A major part of ML are the algorithms for agents acting in the environment and learning from their actions. This is called Reinforcement Learning (RL). To learn more about history and current state of RL, see my Interview with Rich Sutton, the Father of Reinforcement Learning.

RL was the key part of the recent success of AlphaGo Zero and AlphaZero.

Q3. Is Data Scientist a good job?

Yes! Data Scientist was ranked by Glassdoor as the best job in America for 3 years in a row - see

Data Scientist - best job in America, 2018

Recent LinkedIn Economic Graph report also had good news for this field. Machine Learning Engineer and Data Scientist were the top US emerging jobs in 2017, with Machine Learning Engineer jobs growing 9.8 times in 5 years, and Data Scientist job growing 6.5 times.

4. Where can I learn Data Science?

Data Science Education is one of the most popular topics on KDnuggets, with a whole section dedicated to it.

There are many options for learning data science and related topics.

We have recently done a series of surveys of Best Masters in Analytics, Data Science, examining also tuition and ranking of the program. See Here is an overview chart of the top ranked programs from the first post:

MS in Analytics, Data Science - Online and On Campus
MS in Analytics, Data Science - Online and On Campus
from this post

Symbol color is blue for online, green for on-campus; shape is circle for MS in Analytics; square for MS in Data Science.

We note that there is little correlation between ranking and tuition. Most high-ranking universities do NOT offer online degrees. Berkeley and CMU are the exceptions.

Slightly over half of MS degrees we surveyed are called "Data Science" - most of them are technical oriented, and slightly less than half are called "Analytics" - mostly business oriented.

There are also many options for See also relevant KDnuggets posts on Data Science courses and education under

5. What algorithms and methods does Data Scientist use?

While Deep Learning pushes the state-of-the art seemingly every day, and very advanced methods like XGBoost win many Kaggle competitions, most Data Science work involves more basic algorithms and methods.

KDnuggets recent poll

Which Data Science / Machine Learning methods and tools you used in the past 12 months for a real-world application?

had these top 10 results:

Top 10 Data Science, Machine Learning Methods Used, 2017
Top 10 Data Science, Machine Learning Methods Used, 2017 KDnuggets Poll

Deep Learning was used by about 20% of respondents.

Our poll also found which methods were most affiliated with industry:
  • Uplift modeling (for the second year in a row)
  • Anomaly/Deviation detection
  • Gradient Boosted Machines
The most "academic" methods are advanced topics related to Deep Learning:
  • Generative Adversarial Networks (GAN)
  • Reinforcement Learning
  • Recurrent Neural Networks (RNN)
  • Convolutional Nets
In Kaggle 2017 Survey The State of Data Science & Machine Learning the most common Data Science methods used at work were:
  • Logistic Regression, 63.5%
  • Decision Trees, 49.9%
  • Random Forests, 46.3%
  • Neural Networks, 37.6%
  • Bayesian Techniques, 30.6%
To learn more about most important algorithms, see our most popular posts on algorithms and KDnuggets Posts tagged