IBM, Michael Abernethy, Apr 27, 2010
Summary: Data mining is the talk of the tech industry, as companies are generating millions of data points about their users and looking for a way to turn that information into increased revenue. Data mining is a collective term for dozens of techniques to glean information from data and turn it into something meaningful. This article will introduce you to open source data-mining software and some of the most common techniques to interpret data.
So, how do you get you and your company on board the data-mining bandwagon?
We hope to answer all of your initial questions about data mining. We also will introduce you to Waikato Environment for Knowledge Analysis (WEKA), free and open source software you can use to mine your own data and turn what you know about your users, your clients, and your business into useful information for increasing your revenue. You will see that it is not as difficult as you might think it is to do a "pretty good" job of mining data.
Additionally, this article will discuss the first technique for data mining: regression, which transforms existing data into a numerical prediction for future data. It is likely the easiest method of mining data and even on a simple level something you may have done before in your favorite market-dominant spreadsheet software (though WEKA can do much more complex calculations). Future articles will touch upon other methods of mining data, including clustering, nearest neighbor, and classification trees.
Data mining isn't solely the domain of big companies and expensive software. In fact, there's a piece of software that does almost all the same things as these expensive pieces of software - the software is called WEKA (see Resources). WEKA is the product of the University of Waikato (New Zealand) and was first implemented in its modern form in 1997. It uses the GNU General Public License (GPL). The software is written in the Java™ language and contains a GUI for interacting with data files and producing visual results (think tables and curves). It also has a general API, so you can embed WEKA, like any other library, in your own applications to such things as automated server-side data-mining tasks.