WekaMOOC: Data Mining with Weka, complete online course
The course features video lectures by Professor Ian H. Witten, with English & Chinese subtitles, open-source Weka data mining platform. What were the most interesting lectures?
By Gregory Piatetsky, Dec 17, 2013.
Here are the videos for the complete online course WekaMOOC: Data Mining with Weka, which were presented during the free online course Data Mining with Weka.
This courses introduced data mining concepts through practical experience with the free Weka tool.
Watch all the videos at Weka MOOC Channel
www.youtube.com/user/WekaMOOC/videos?flow=grid&view=0&sort=da
We did a little "data mining" of the "Data Mining with Weka" and analyzed the number of views by lecture. The chart below shows the steady decline in views (numbers as of Dec 17, 2013) of later lectures vs earlier lectures.
We note that there is an increase in views as new parts start (indicated by the break in the line).
What is more interesting, is that some lectures views rise above the declining trend and have more views more than preceding lectures, such as 2.5: Cross-Validation, 3.4: Using Decision Trees and 4.5: Support vector machines and Summary.
They are highlighted in bold below. Increase in viewership despite the declining trend is a simple but potentially useful heuristic to select more interesting lectures, rather than just by simple number of views.
Lecture | Views |
---|---|
Part 1: Trailer | 9497 |
1.1: Introduction | 7,310 |
1.2: Exploring the Explorer | 5,502 |
1.3: Exploring datasets | 4,693 |
1.4: Building a classifier | 4,229 |
1.5: Using a filter | 3,592 |
1.6: Visualizing your data | 3,454 |
Part 2 | |
2.1: Be a classifier! | 4,058 |
2.2: Training and testing | 3,107 |
2.3: Repeated training and testing | 2,617 |
2.4: Baseline accuracy | 2,391 |
2.5: Cross-validation | 2,534 |
2.6: Cross-validation results | 2,155 |
Part 3 | |
3.1: Simplicity first! | 2,456 |
3.2: Overfitting | 2,115 |
3.3: Using probabilities | 2,046 |
3.4: Decision trees | 2,258 |
3.5: Pruning decision trees | 1,864 |
3.6: Nearest neighbor | 1,721 |
Part 4 | |
4.1: Classification boundaries | 1,934 |
4.2: Linear regression | 1,784 |
4.3: Classification by regression | 1,658 |
4.4: Logistic regression | 1,619 |
4.5: Support vector machines | 1,945 |
4.6: Ensemble learning | 1,458 |
Part 5 | |
5.1: The data mining process | 1,367 |
5.2: Pitfalls and pratfalls | 1,264 |
5.3: Data mining and ethics | 1,118 |
5.4: Summary | 1,366 |