KDnuggets Home » News » 2010 » Jul » Publications » Learning data mining in 21 days

teach yourself datamining in 21 days


 
  
World of Warcraft player learns data mining and shares his insights, including how to cluster paladins or predict toon's class. Sample data in Weka format provided.


Date:

Warcraft Armoury, July 5, 2010

Things may look quiet here but behind the scenes I've been working on my project to get up to speed with modern datamining algorithms. The first step has been to assemble some sources of information and some tools for the job.

For a textbook, I've chosen Introduction To Data Mining by Tan, Steinbach and Kumar. It provides a good overview of the key algorithms, along with important issues like data quality and consistency. It also introduces the maths in a reasonably gentle way.

Fortunately, while it is important to understand how the algorithms work, it is not necessary to work the maths by hand. There are some first class freeware datamining programs available that do all the heavy lifting, so long as you know how to prepare the data and how to set the parameters of the algorithms so they produce valid results.

Three datamining packages in particular are worth noting:

Weka and RapidMiner are GUI-driven toolsets where R is more command-line oriented. You can download all of these and play around with them at home. They're not toys, so you need to have some confidence about plowing through the user guides and technical manuals, but they are easy enough to get up and running.

The choice between Weka and RapidMiner is a difficult one. At the moment I'm working with Weka but that is mainly because it was the first one I started experimenting with.

The other crucial thing to have at hand is some test data. Of course you may want to remind me that I have several GB of WoW-related data right here. But that is not the place to start. The first step is to learn how the tools work and how to use them. For that we need data where we know the expected results - so that when the tool doesn't produce the right result we know to look again at how we have applied the algorithm.

Over at the Expressive Intelligence Studio blog, Chris Lewis posted an interesting report about using a toon's gear to predict its class. That's exactly the sort of place we want to start since all sorts of classifying and clustering algorithms could be tested on a data set like that. The other idea that occurred to me is to use talent builds to predict the spec of the toon. Of course you can do that in a very simple way by just adding up the points spent in each of the three trees: a paladin that has most points in the holy tree is a holy paladin.

Read more.


KDnuggets Home » News » 2010 » Jul » Publications » Learning data mining in 21 days