KDnuggets : Web Mining Course
Gregory Piatetsky-Shapiro
Gregory Piatetsky-Shapiro
Here are the teaching modules for a unit on
Web Mining
with the focus on
Web Usage Mining and
Web Log Analysis.

Web Mining Course Modules

Assignments

Note: Professors using these modules can get answers by contacting Gregory Piatetsky directly at gregory at kdnuggets dot com.

Data

This course uses KDnuggets web server log data for Nov 16, 2005, which was anonymized by replacing the IP address with ipNNNN.TLD, where NNNN is some random number, and TLD is top-level domain, e.g. .com. For unresolved IP addresses, .unr was used.

This data can be downloaded from kdlog.zip (0.6 MB) in www.kdnuggets.com/web_mining_course/ directory.

First 100 log lines are in the unzipped file d100.log in the same directory.

Acknowledgments

This project is a continuation of Data Mining Course project, and was funded by a grant from W. M. Keck Foundation, Los Angeles, CA and Howard Hughes Medical Institute, Chevy Chase, MD, as part of Connecticut College Series of Modules in Emerging Fields.

I am grateful to Gary Parker (Connecticut College) for his encouragement and support and to Anand Rajaraman and Jeffrey Ullman (Stanford) for permission to use part of their "Introduction to Web Mining" lecture.


KDnuggets : Web Mining Course