Data Mining and Analysis: Fundamental Concepts and Algorithms, free PDF download (draft)

New book by Mohammed Zaki and Wagner Meira Jr is a great option for teaching a course in data mining or data science. It covers both fundamental and advanced data mining topics, emphasizing the mathematical foundations and the algorithms, includes exercises for each chapter, and provides data, slides and other supplementary material on the companion website.

Data Mining and Analysis: Fundamental Concepts and AlgorithmsData Mining and Analysis: Fundamental Concepts and Algorithms, by
Mohammed Zaki and Wagner Meira Jr, to be published by Cambridge University Press in 2014.

This book is an outgrowth of data mining courses at RPI and UFMG; the RPI course has been offered every Fall since 1998, whereas the UFMG course has been offered since 2002. While there are several good books on data mining and related topics, we felt that many of them are either too high-level or too advanced. Our goal was to write an introductory text which focuses on the fundamental algorithms in data mining and analysis. It lays the mathematical foundations for the core data mining methods, with key concepts explained when first encountered; the book also tries to build the intuition behind the formulas to aid understanding.

The main parts of the book include exploratory data analysis, frequent pattern mining, clustering and classification. The book lays the basic foundations of these tasks, and it also covers cutting edge topics like kernel methods, high dimensional data analysis, and complex graphs and networks. It integrates concepts from related disciplines like machine learning and statistics, and is also ideal for a course on data analysis. Most of the prerequisite material is covered in the text, especially on linear algebra, and probability and statistics.

The book includes many examples to illustrate the main technical concepts. It also has end of chapter exercises, which have been used in class. All of the algorithms in the book have been implemented by the authors. We suggest that the reader use their favorite data analysis and mining software to work through our examples, and to implement the algorithms we describe in text; we recommend the R software, or the Python language with its NumPy package.

The datasets used and other supplementary material like project ideas, slides, and so on, are available online at the book's companion site and its mirrors at RPI and UFMG:

You may download the PDF of the book draft here. Note that it shall be available for purchase from Cambridge University Press and other standard distribution channels, that no unauthorized distribution shall be allowed, and that the reader may take one copy only for personal use.