At the highest level of description, this book is about data mining. However, it focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to "train" a machine-learning engine of some sort.
Download Mining of Massive Datasets, (PDF, 340 pages, 2MB)
You can find materials from past offerings of CS345A at:
There, you will find slides, homework assignments, project requirements, and