WekaMOOC: Data Mining with Weka, complete online course

The course features video lectures by Professor Ian H. Witten, with English & Chinese subtitles, open-source Weka data mining platform. What were the most interesting lectures?

By Gregory Piatetsky, Dec 17, 2013.

WekaHere are the videos for the complete online course WekaMOOC: Data Mining with Weka, which were presented during the free online course Data Mining with Weka.

This courses introduced data mining concepts through practical experience with the free Weka tool.

Watch all the videos at Weka MOOC Channel


We did a little "data mining" of the "Data Mining with Weka" and analyzed the number of views by lecture. The chart below shows the steady decline in views (numbers as of Dec 17, 2013) of later lectures vs earlier lectures.

WekaMOOC Views

We note that there is an increase in views as new parts start (indicated by the break in the line).

What is more interesting, is that some lectures views rise above the declining trend and have more views more than preceding lectures, such as 2.5: Cross-Validation, 3.4: Using Decision Trees and 4.5: Support vector machines and Summary.

They are highlighted in bold below. Increase in viewership despite the declining trend is a simple but potentially useful heuristic to select more interesting lectures, rather than just by simple number of views.

Lecture Views
Part 1: Trailer 9497
1.1: Introduction 7,310
1.2: Exploring the Explorer 5,502
1.3: Exploring datasets 4,693
1.4: Building a classifier 4,229
1.5: Using a filter 3,592
1.6: Visualizing your data 3,454
Part 2
2.1: Be a classifier! 4,058
2.2: Training and testing 3,107
2.3: Repeated training and testing 2,617
2.4: Baseline accuracy 2,391
2.5: Cross-validation 2,534
2.6: Cross-validation results 2,155
Part 3
3.1: Simplicity first! 2,456
3.2: Overfitting 2,115
3.3: Using probabilities 2,046
3.4: Decision trees 2,258
3.5: Pruning decision trees 1,864
3.6: Nearest neighbor 1,721
Part 4
4.1: Classification boundaries 1,934
4.2: Linear regression 1,784
4.3: Classification by regression 1,658
4.4: Logistic regression 1,619
4.5: Support vector machines 1,945
4.6: Ensemble learning 1,458
Part 5
5.1: The data mining process 1,367
5.2: Pitfalls and pratfalls 1,264
5.3: Data mining and ethics 1,118
5.4: Summary 1,366