Top Coursera Data Science Specializations: Comparison & Exclusive Insight

There are more MOOC learning options for Data Scientists today than ever. Take a tour of Coursera's 8 Data Science specializations, with exclusive insight from program coordinators and course instructors.

By Matthew Mayo.

Coursera has been a favorite learning platform for aspiring and practicing data scientists for a number of years, with quality courses such as Mining Massive Datasets, Introduction to Data Science, and Machine Learning having long been standouts. In early 2014, Coursera began introducing specializations, tracks of multiple courses, in a number of areas of study, with a single data science specialization existing from the very beginning. As the number of specializations steadily increases, Fall 2015 brings a number of additional offerings in the data science realm, giving prospective learners myriad options for pursuing data science education.

Coursera Logo

This post will examine the 8 current Coursera data science and data science-related specialization offerings, and provide some additional insight directly from the specializations' instructors and coordinators. While these specializations cover more "traditional" data science topics such as the Hadoop, Python, and R ecosystems, there are now options for those interested in more niche topics, such as "executive" data science, bioinformatics, and machine learning. It should also be noted that, like all Coursera material, all course material is freely-accessible, but if you are interested in course or specialization certificates fees do apply. Capstone projects are, however, only accesible to paying students having completed all of the prior specialization coursework.

Note: KDnuggets gets absolutely no royalties from Coursera - this list is presented only to help our readers evaluate interesting courses and specializations.

Machine Learning Specialization, University of Washington

UW Machine Learning The University of Washington's Machine Learning Specialization was developed in conjunction with Dato and got underway with its first session in September. It uses Python in all courses, and so an understanding of the language is useful prior to enrolling. A number of the common Python machine learning tools are used throughout the specialization, and there is flexibility to try out Dato's proprietary GraphLab Create in the first course, with academic licenses available for all students interested in expanding their set of tools.

The specialization consists of the following courses:

▪  Machine Learning Foundations: A Case Study Approach
▪  Regression
▪  Classification
▪  Clustering & Retrieval
▪  Recommender Systems & Dimensionality Reduction
▪  Machine Learning Capstone: An Intelligent Application with Deep Learning

The course list indicates that a solid base for machine learning is provided. A blog post outlining the specialization is a good starting point for anyone looking to better understand the approach taken.

I caught up with specialization instructors Emily Fox and Carlos Guestrin, who answered the following questions for us.

What distinguishes your data science specialization from the others currently available via Coursera?

Most machine learning courses, including the current ML courses on Coursera, take a "bottom up" approach: they start from the foundations of "what are probabilities of events?" and "how do we estimate them?", then cover basic ML models and optimization algorithms, and eventually get to more advanced ML methods. Rarely, do these courses cover how these methods are used in real-world problems and the practical issues associated with them.

We take an alternative approach building on case studies. We start each section by defining an end-to-end case study of how ML has impact on real-world applications. We then dig in to how ML is used in these applications. Finally, we describe the models and algorithms used to make this possible. We call this a "case study approach" that provides hands on experience in ML. All courses include hands-on exercises involving real-world applications.

Our goal is to make the specialization accessible to folks with no ML or Stats background. We start from the basics of how ML is used. However, by the time learners get to their capstone project in the 6th course, they will build and deploy a real intelligent application that uses deep learning on image and text data to provide a whole new type of recommender system.

Relative to the Data Science Specialization on Coursera, we focus more on machine learning, which involves building intelligent applications that learn from data to form real-time predictions. Data science, on the other hand, typically focuses on analyzing a single dataset at depth. The ML Specialization also focuses on what it takes to deploy these techniques in production and at scale.

What 2 or 3 concepts or technologies does your specialization focus on the most?

Case-study approach: how ML is used in the real-world and what are the ML techniques that make it possible.
Basics to state-of-the-art ML: we cover the foundational topics, but build up to state-of-the-art methods, such as deep learning and boosted trees, using these case studies.
Deployment and scalability: ML doesn't end in a performance curve for a paper. Our learners actually build and deploy applications that use ML.

How does the specialization compare to similar course(s) at your university, if at all?

There are no courses currently at UW that cover the material in this specialization. Most of our UW courses are targeted at graduate students and advanced undergraduates looking for the theoretical foundations of ML, assuming prior background in statistics. This specialization is focused on helping learners go from only having some programming background to having a deep understanding of what it takes to build an intelligent application that uses ML in the real-world.

What else would you like people to know about your specialization?

For students who already have ML background, there are probably other courses that will be more appropriate to start with. However, for those who want to take a hands-on approach and really learn ML in practice, this is the specialization for you. :)

Big Data Specialization, UCSD

UCSD Big Data The University of California, San Diego's Big Data Specialization was developed alongside Splunk. It is another new upstart specialization which got underway this Fall, and focuses mainly on what first comes to mind when you think Big Data: the Hadoop/Spark ecosystem. It does, however, have some other topics thrown in as well, including hot topics such as graph analytics and machine learning.

The specialization contains the following courses:

▪  Introduction to Big Data
▪  Hadoop Platform and Application Framework
▪  Introduction to Big Data Analytics
▪  Machine Learning With Big Data
▪  Introduction to Graph Analytics
▪  Big Data - Capstone Project

Specialization coordinator Natasha Balac was kind enough to provide some further insight for us, answering the following questions.

What distinguishes your data science specialization from the others currently available via Coursera?

This is the only Big Data Focused specialization on the platform.

What 2 or 3 concepts or technologies does your specialization focus on the most?

We teach Hadoop Based frameworks and related technologies. We cover the Hadoop "Zoo" from MapReduce to Spark, from data wrangling to predictive analytics on very large data.

How does the specialization compare to similar course(s) at your university, if at all?

There are very few Big Data courses at University level in general - more are emerging slowly. The Big Data Specialization is a set of 5 courses covering basic and advance Big Data topics. The technology is changing so rapidly - it is almost a full time job just to keep up!!! ;-)

What else would you like people to know about your specialization?

We take pride in presenting difficult or technically dense material in a simple easy to understand ways - come check it out and learn with us!