|
Gregory Piatetsky-Shapiro
|
Here are the teaching modules for a unit on
Web Mining
with the focus on
Web Usage Mining and
Web Log Analysis.
|
Web Mining Course Modules
To get the presentations, add www.kdnuggets.com/web_mining_course/ in front of ppt files below
- Module 1: Introduction to Web Mining
wm1-web-mining-intro.ppt
- Module 2a: Web Server Log
wm2a-web-server-log.ppt
- Module 2b: Unix tools for web log analysis
wm2b-unix-web-log-analysis.ppt
- Module 3a: Hit Analysis
wm3a-hit-analysis.ppt
- Module 3b: Gawk tools for web log analysis<
wm3b-gawk-web-log-analysis.ppt
- Module 4a: Visit Analysis; Bot or Not?
wm4a-visit-analysis.ppt
- Module 4b: Perl tools for web log analysis
wm4b-perl-web-log-analysis.ppt
Basic Perl script for web log parsing (web_log_parse.txt)
- Module 5: Behavior modeling
wm5-behaviour-analysis.ppt
Assignments
Note: Professors using these modules can get answers by contacting Gregory Piatetsky directly at gregory at kdnuggets dot com.
Data
This course uses KDnuggets web server log data for Nov 16, 2005, which was anonymized by replacing the IP address with ipNNNN.TLD, where NNNN is some random number, and TLD is top-level domain, e.g. .com. For unresolved IP addresses, .unr was used.
This data can be downloaded from
kdlog.zip (0.6 MB) in www.kdnuggets.com/web_mining_course/ directory.
First 100 log lines are in the unzipped file d100.log in the same directory.
Acknowledgments
This project is a continuation of
Data Mining Course project,
and was funded by a grant from
W. M. Keck Foundation, Los Angeles, CA and
Howard Hughes Medical Institute, Chevy Chase, MD,
as part of
Connecticut College Series of
Modules in Emerging Fields.
I am grateful to Gary Parker (Connecticut College) for his encouragement and support and to Anand Rajaraman and Jeffrey Ullman (Stanford) for permission to use part of their "Introduction to Web Mining" lecture.
See also → Data Mining Course modules for 1-semester undergraduate course → Education → Education » online
|