Advice for Learning Data Science from Google’s Director of Research
Surfing the professional career wave in data science is a hot prospect for many looking to get their start in the world. The digital revolution continues to create many exciting new opportunities. But, jumping in too fast without fully establishing your foundational skills can be detrimental to your success, as is suggested by this advice for data science newbies from Peter Norvig, the Director of Research at Google.
“In 2021, professionals in the digital market space must be comfortable with data — period. They must know how to manipulate data, understand how it is collected, and analyze and interpret it. The future of decision making is grounded in data science.” — Wendy Moe, Professor of Marketing, University of Maryland
Data science skills have become increasingly more important for jobs that once had little to do with statistics, including marketing and business. Adding data science skills to your portfolio will give you an edge in your current role in the market this year.
If you are interested in adding data science to your portfolio, you no doubt might have pondered over these questions:
- How long does it take to learn the fundamentals of data science?
- What are some resources for learning data science?
This article discusses some general advice from Peter Norvig to individuals considering data science.
Background about Peter Norvig (Director of Research at Google)
The motivation for choosing the above title is based on Peter Norvig’s idea of the amount of time it takes to become an expert in programming. If you have not read this article: “Teach Yourself Programming in 10 Years” by Peter Norvig, I encourage you to do so.
The point here is that you don’t need 10 years to learn the basics of data science, but learning data science in a rush is certainly not helpful. It takes time, effort, energy, patience, and commitment to become a data scientist.
Peter Norvig’s suggestion is that learning requires time, patience, and commitment. Beware of articles, books, or websites that tell you that you can learn data science in 4 weeks.
Image by Benjamin O. Tayo.
If you are interested in learning the fundamentals of data science, be prepared to invest the right amount of time and energy. That way, you can master not just the superficial concepts but the in-depth concepts of data science.
It took me 2 years of in-depth studies to master the basics of data science (through self-study), and I continue to challenge myself to learn new things every day. How long it is going to take you to master the fundamentals of data science would depend on your background. Generally, a solid background in an analytical discipline such as mathematics, statistics, computer science, engineering, or economics is advantageous.
3 Lessons From Peter Norvig’s “Teach Yourself Programming in Ten Years”
1) It takes time, effort, energy, patience, and commitment to master the fundamentals of data science.
Data science is a very multidisciplinary field that requires a solid background in advanced mathematics, statistics, programming, and other related skills in data analysis, data visualization, model building, machine learning, etc. It took me 2 years of dedicated studies to master the fundamentals of data science, and that is because of my solid background in mathematics, physics, and programming. Here are some resources that helped me master the fundamentals of data science.
(i) Professional Certificate in Data Science (HarvardX, through edX)
Includes the following courses, all taught using R (you can audit courses for free or purchase a verified certificate):
- Data Science: R Basics
- Data Science: Visualization
- Data Science: Probability
- Data Science: Inference and Modeling
- Data Science: Productivity Tools
- Data Science: Wrangling
- Data Science: Linear Regression
- Data Science: Machine Learning
- Data Science: Capstone
(ii) Analytics: Essential Tools and Methods (Georgia TechX, through edX)
Includes the following courses, all taught using R, Python, and SQL (you can audit for free or purchase a verified certificate):
- Introduction to Analytics Modeling
- Introduction to Computing for Data Analysis
- Data Analytics for Business
(iii) Applied Data Science with Python Specialization (the University of Michigan, through Coursera)
Includes the following courses, all taught using Python (you can audit most courses for free, some require the purchase of a verified certificate):
- Introduction to Data Science in Python
- Applied Plotting, Charting & Data Representation in Python
- Applied Machine Learning in Python
- Applied Text Mining in Python
- Applied Social Network Analysis in Python
(iv) Data Science Textbooks
Learning from a textbook provides a more refined and in-depth knowledge beyond what you get from online courses. This book provides a great introduction to data science and machine learning, with code included: “Python Machine Learning” by Sebastian Raschka.
The author explains fundamental concepts in machine learning in a way that is very easy to follow. Also, the code is included, so you can actually use the code provided to practice and build your own models. I have personally found this book to be very useful in my journey as a data scientist. I would recommend this book to any data science aspirant. All that you need is basic linear algebra and programming skills to be able to understand the book.
There are lots of other excellent data science textbooks out there such as “Python for Data Analysis” by Wes McKinney, “Applied Predictive Modeling” by Kuhn & Johnson, and “Data Mining: Practical Machine Learning Tools and Techniques” by Ian H. Witten, Eibe Frank & Mark A. Hall.
(v) Network with other Data Science Aspirants
From my personal experience, I have learned a lot from weekly group conversations on various topics in data science and machine learning by teaming up with other data science aspirants. Network with other data science aspirants, share your code on GitHub, showcase your skills on LinkedIn. This will really help you to learn a lot of new concepts and tools within a short period of time. You also get exposed to new ways of doing things, as well as to new algorithms and technologies.
2) Understanding the theoretical foundations of data science is as important as hands-on data science skills.
Data science is heavily math-intensive and requires knowledge in the following:
(i) Statistics and Probability
(ii) Multi-variable Calculus
(iii) Linear Algebra
(iv) Optimization and Operational Research
Find out more about math topics that you need to focus on from here: Essential Math Skills for Machine Learning.
Even though packages such as Python’s sci-kit learn and R’s Caret package contain several tools for doing data science and building machine learning models, it is extremely important to understand the theoretical foundations of each method.
3) Avoid using machine learning models as blackbox tools.
A solid background in data science would enable a data scientist to build reliable predictive models. For example, before building a model, you may ask yourself:
(i) What are the predictor variables?
(ii) What is the target variable? Is my target variable discrete or continuous?
(iii) Should I use classification or regression analysis?
(iv) How do I handle missing values in my dataset?
(v) Should I use normalization or standardization when bringing variables to the same scale?
(vi) Should I use Principal Component Analysis or not?
(vii) How do I tune hyperparameters in my model?
(viii) How do I evaluate my model to detect biases in the dataset?
(ix) Should I use ensemble methods where I train using different models then perform an ensemble average, e.g., using classifiers such as SVM, KNN, Logistic Regression, then average over 3 models?
(x) How do I select the final model?
What makes the difference between a good and a bad machine learning model depends on one’s ability to understand all the details of the model, including knowledge about different hyperparameters and how these parameters can be tuned in order to obtain the model with the best performance. Using any machine learning model as a black box without fully understanding the intricacies of the model will lead to a falsified model.
In summary, data science is one of the hottest fields nowadays. The digital revolution has created tons upon tons of data. Companies, industries, organizations, and the government are producing tons upon tons of data on a daily basis. The demand for high-skilled data scientists will only continue to grow. This is the right time to invest your time to master the fundamentals of data science. In doing so, beware of articles, books, or websites that tell you that you can learn data science in 4 weeks or in a month. Do not be in a rush. Take your time to master the fundamentals of data science.
Original. Reposted with permission.
- A checklist to track your Data Science progress
- 10 Mistakes You Should Avoid as a Data Science Beginner
- Don’t learn Machine Learning in 24 hours