Web Mining Course ModulesTo get the presentations, add www.kdnuggets.com/web_mining_course/ in front of ppt files below
- Module 1: Introduction to Web Mining
- Module 2a: Web Server Log
- Module 2b: Unix tools for web log analysis
- Module 3a: Hit Analysis
- Module 3b: Gawk tools for web log analysis<
- Module 4a: Visit Analysis; Bot or Not?
- Module 4b: Perl tools for web log analysis
Basic Perl script for web log parsing (web_log_parse.txt)
- Module 5: Behavior modeling
- Assignment 1 - news stories.
- Assignment 2 - global analysis.
- Assignment 3 - hit-level analysis.
- Assignment 4 - Perl for visit-level analysis.
- Assignment 5 - bringing all together - final project.
DataThis course uses KDnuggets web server log data for Nov 16, 2005, which was anonymized by replacing the IP address with ipNNNN.TLD, where NNNN is some random number, and TLD is top-level domain, e.g. .com. For unresolved IP addresses, .unr was used.
This data can be downloaded from kdlog.zip (0.6 MB) in www.kdnuggets.com/web_mining_course/ directory.
First 100 log lines are in the unzipped file d100.log in the same directory.
This project is a continuation of Data Mining Course project, and was funded by a grant from W. M. Keck Foundation, Los Angeles, CA and Howard Hughes Medical Institute, Chevy Chase, MD, as part of Connecticut College Series of Modules in Emerging Fields.
I am grateful to Gary Parker (Connecticut College) for his encouragement and support and to Anand Rajaraman and Jeffrey Ullman (Stanford) for permission to use part of their "Introduction to Web Mining" lecture.