Web Mining Course Modules
Assignments
DataThis course uses KDnuggets web server log data for Nov 16, 2005, which was anonymized by replacing the IP address with ipNNNN.TLD, where NNNN is some random number, and TLD is top-level domain, e.g. .com. For unresolved IP addresses, .unr was used.This data can be downloaded from kdlog.zip (0.6 MB) in www.kdnuggets.com/web_mining_course/ directory. First 100 log lines are in the unzipped file d100.log in the same directory. AcknowledgmentsThis project is a continuation of Data Mining Course project, and was funded by a grant from W. M. Keck Foundation, Los Angeles, CA and Howard Hughes Medical Institute, Chevy Chase, MD, as part of Connecticut College Series of Modules in Emerging Fields. I am grateful to Gary Parker (Connecticut College) for his encouragement and support and to Anand Rajaraman and Jeffrey Ullman (Stanford) for permission to use part of their "Introduction to Web Mining" lecture. |