Top Coursera Data Science Specializations: Comparison & Exclusive Insight
There are more MOOC learning options for Data Scientists today than ever. Take a tour of Coursera's 8 Data Science specializations, with exclusive insight from program coordinators and course instructors.
Bioinformatics Specialization, UCSD
Bioinformatics is an interdisciplinary field which uses select tools and techniques from mathematics, computer science, statistics, engineering, and other fields, to analyze biological data; from our perspective, we could say that bioinformatics is the intersection of data science and biology. UCSD's Bioinformatics Specialization is a first of its kind in the field, and looks like it could be of benefit not only to those coming from the world of biology to data science, but to the reverse as well.
This specialization is made up of the following courses:
▪ Finding Hidden Messages in DNA (Bioinformatics I)
▪ Genome Sequencing (Bioinformatics II)
▪ Comparing Genes, Proteins, and Genomes (Bioinformatics III)
▪ Deciphering Molecular Evolution (Bioinformatics IV)
▪ Genomic Data Science and Clustering (Bioinformatics V)
▪ Finding Mutations in DNA and Proteins (Bioinformatics VI)
▪ Bioinformatics Capstone: Big Data in Biology
Instructor Phillip Compeau provided us with the following detailed feedback.
What distinguishes your data science specialization from the others currently available via Coursera?
Our specialization is unique, not just as a data science specialization, but as a series of STEM MOOCs in general, in a few different ways. First, it has an enormous amount of production for a series of MOOCs, and is the result of a development team working for the past two years. For example, although our courses have lecture videos with very high production quality, the production of these lecture videos represents a very small component of our overall investment of time and resources (unlike most MOOCs, for which this is essentially the sole focus). Instead, our MOOCs are built upon the creation of an interactive textbook applying the principles of active learning. As soon as learners encounter a tricky concept, we ask them to stop and think about it before transitioning. We peppered the text with hundreds of exercises; some of these build learning, others are opportunities for learners to implement the bioinformatics algorithms that they encounter, and others allow them to apply these algorithms to real biological datasets. Each page of the interactive text is linked to its own discussion forum, and students have made thousands of posts over the last two years. Furthermore, an important part of the process of developing this interactive text was responding to student concerns. To do this, we mined through 8500 discussion forum posts and have made widescale changes to every single page of the interactive text, as well as creating FAQs and additional remedial learning modules to help address the most common errors encountered by learners. Pavel and I outlined our vision for what 21st century textbooks in STEM fields should look like in a recent Communications of the ACM Viewpoints article: http://m.cacm.acm.org/magazines/2015/10/192385-life-after-moocs/fulltext
Second, bioinformatics is inherently interdisciplinary, being at the intersection of computer science, biology, mathematics, and data science. As such, it attracts learners who arrive with varying strengths. It means that we have had to think about how to adapt the content for learners with these strengths. For example, our courses are currently divided into two main tracks: a "biologist track" and a "hacker track". All learners read the course interactive text, watch the course videos, and take quizzes. However, learners on the hacker track implement the algorithms that they encounter in the text; learners following the biologist track do not need to program but do need to learn how to apply existing software resources in bioinformatics. Accordingly, we also have a series of "Bioinformatics Application Challenges" for these learners in which they can learn how to apply some of this existing software while following a narrative that is tangential to what they have learned in the main text. For example, in the main text, learners see how researchers sequence genomes by solving a 300-year-old mathematical puzzle. The hacker track learners write their own algorithms (in the language of their choosing) to assemble genomes on their own; the biologist track learners have an Application Challenge walking them through how to use the popular SPAdes assembler to analyze the quality of an assembly for Staphylococcus.
What 2 or 3 concepts or technologies does your specialization focus on the most?
The greatest central theme of our Bioinformatics Specialization is the importance of being able to formulate a biological challenge as a precise computational problem. This is a skill that is often lacking in life science education. For example, when looking for the location in a bacterium's DNA where the bacterium starts replicating this DNA, we are essentially looking for a "hidden message" saying "start replication here!" But this problem makes no sense to a computer scientist, who needs to be told exactly what to look for. The interplay between learning new biological facts and using these facts to formulate increasingly robust computational problems is constant throughout our courses.
In terms of specific course content, each chapter of content addresses a central questions to modern biology such as "How Do We Assemble Genomes?", "Why Do We Still Not Have an HIV Vaccine?", and "How Do We Find Disease-Causing Mutations?" We see how approaches from a variety of technical topics such as graph theory, machine learning, and standard data science methods such as clustering algorithms can be applied to solve each central question. From a biological perspective, we have a heavy focus on biological sequence analysis and the methods needed to address it.
How does the specialization compare to similar course(s) at your university, if at all?
The course content covered in the Bioinformatics Specialization is identical to some of the coursework taken by students in the renowned Bioinformatics Ph.D. program at UC San Diego. Furthermore, the print companion of the course (Bioinformatics Algorithms: An Active Learning Approach) has already been adopted in about twenty universities, some of which offer flipped classes based on the book. This is another way in which we feel that our courses are unique, as the majority of online courses do not currently have the rigor of a course that one would take at a leading offline institution.
What else would you like people to know about your specialization?
We are very proud of partnering with Illumina (the leader in Genome Sequencing) to design a really interesting Capstone project (launching in the spring) based on their BaseSpace cloud platform, and Illumina is interested in interviewing students who excel in our Specialization. More generally, we think that our Specialization has excellent potential to be adopted by many university programs in bioinformatics around the world, and that it will be a great resource for biologists, computer scientists, and data scientists alike to add an important set of knowledge to their skillset in the rapidly growing biotech market. Learners in the latter two groups may have never even realized how relevant many classic approaches in CS and data science really are to modern biology, and the enormous demand for people who can bring these skillsets to biology, and we hope that our Specialization can help bridge this divide.
Data Mining Specialization, University of Illinois, Urbana-Champaign
The University of Illinois, Urbana-Champaign's Data Mining Specialization is foundational and theoretical in nature, covering the fundamentals of data mining without consideration to specific tools or languages. The specialization is made up of the following courses:
▪ Pattern Discovery in Data Mining
▪ Text Retrieval and Search Engines
▪ Cluster Analysis in Data Mining
▪ Text Mining and Analytics
▪ Data Visualization
▪ Data Mining Capstone
Data Science at Scale Specialization, University of Washington
The University of Washington's Big Data at Scale Specialization grew out of their original Introduction to Data Science course, which has been offered a number of times of the past 3 years. The specialization covers the paradigms, the practice, and the professional aspects of performing data science. The courses included in this track are:
▪ Data Manipulation at Scale: Systems and Algorithms
▪ Practical Predictive Analytics: Models and Methods
▪ Communicating Results: Visualization, Ethics, Reproducibility
▪ Data Science at Scale - Capstone Project
With all of the data science MOOC options available today, it's difficult to know where to begin looking. We hope that this summary, at the very least, gives you some direction in narrowing down your numerous choices.
Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.