The Best Python Courses: An Analysis Summary
What does the data reveal if we ask: "What are the 10 Best Python Courses?". Collecting almost all of the courses from top platforms shows there are plenty to choose from, with over 3000 offerings. This article summarizes my analysis and presents the top three courses.
What does the data reveal if we ask: "What are the 10 Best Python Courses?". Collecting almost all of the courses from top platforms shows there are plenty to choose from, with over 3000 offerings.
This article summarizes my analysis and presents the top three courses. For the full article, which includes all the code required to reproduce my results, please see the original: 10 Best Python Courses According to Data Analysis.
TL;DR: The Winners
Out of all courses collected, the analysis showed these were the top three:
- Learn Python by Codecademy
- Introduction to Python Programming by Udacity
- Programming for Everybody (Getting Started with Python) by Coursera
So, if you are just here for recommendations, check out the top three. However, if you are interested in the data and methods used to generate the top results, then continue reading for a full summary.
Compromises & Assumptions
Data analysis requires taking a reductionist approach to the world. Often, data must be selected because of the likelihood that it correlates well with a desired property. Such compromises and assumptions are essential, but equally so is the ability to state what assumptions we made. That way, others can critique your approach and understand its limitations.
- Google's search engine ranking is a fair reflection of both the quantity and quality of backlinks
- That the popularity of a course page is positively correlated with its number of unique links
- That the popularity of a course page is positively correlated with its traffic
- That the selection of platforms used in the analysis is comprehensive
- Filtering the data for "Python" in the URL or top keyword for the page did not exclude relevant or include irrelevant courses
- The basic Ahrefs plan used provides the top 50 search results on Google, not the entire possible set
- Ahref's Domain Rating is only an estimate of Google's secretive algorithm
Preparing the data for analysis is often the single most demanding step. In this instance, I:
1. Exported each course platform into separate CSVs.
We already know that we will need to filter the data further to exclude certain irrelevant pages, but that is for later.
These libraries will assist with exploring and visualizing the data.
3. Make an empty DataFrame, and concatenate each file onto the DataFrame.
Since each CSV has its own index, we need to reset_index to make a new index for the combined dataframe.
4. Examine the data to construct meaningful filters
This deals with the issue mentioned in 1.
5. Remove duplicates
For example, the URL could be "http" or "https" and may or may not have www.
6. Applied feature engineering to create an interaction term between referring domains and traffic.
Compute the z-scores of Traffic and Referring Domains to create a new property: the average of the two z-scores using the scipy library.
7. Iterated another round of cleaning
For example, Codecademy's course for Python 2 ranked at the top but is superseded by Python 3. The solution: we retained Codecademy's position and switched the recommended course. For now, human common sense is a crucial input to this kind of analysis.
As is typical, the data visualization started during the data preparation stages. It helps to get a feel for what is muddying the waters.
For example, once courses were plotted by their traffic and referring domains, it was clear that we still had some major cleaning to do (step 7 above) to remove irrelevant courses. The following plot shows some non-relevant courses that made it into the data:
Undertaking visualization typically requires creating a new dataframe that groups data in the format you need. For example, the Python courses and traffic by platform as a grouped bar plot:
The above plot leads to an interesting observation: more Python-related courses on a platform does necessarily mean more traffic. This seems to confirm the adage that quality is more important the quantity.
The quantitative analysis gave us our rankings. For some qualitative analysis of the free versions of the courses, I joined them and jumped in to share my impressions.
Certificate: ✔ Quizzes: ✔ Projects: ✔ Interactive: ✔
A solid beginner-level course for solving problems with Python, covering fundamental topics that are often completely ignored by other courses.
Video: ✔ Quizzes: ✔
This beginner-level course still offers good value despite not impressing me as much as others. For example, unlike Codecademy, each lesson has a recorded lecture.
Certificate: ✔ (with payment) Video: ✔ Quizzes: ✔
This course's videos are much more organized and entertaining than others I trialed. The pace and difficulty level starts low, but it does (eventually) bring you up to an intermediate-level knowledge of Python. The Python learning environment is particularly well-engineered.
Codecademy's course looks like a safe bet to start your Python journey. But, no matter which course you choose, tackle projects as often as possible. Find a problem you're interested in and keep programming until you have something that solves it. Show off your solution, use it to help others, and even make a career out of it. Using Python — and programming in general — to create value for yourself and others is a rewarding feedback loop that will keep you working, moving forward, and getting better.
We hope that this article helps you on your Python journey. Check out the original article for more detail, complete code, and the rest of the top 10.
Brendan Martin is Founder and Editor-in-Chief, LearnDataSci. LearnDataSci is making data science learning online accessible to everyone. Monthly articles geared towards helping online learners get an intuitive understanding of data science and machine learning topics that are essential for data science careers.