KDnuggets Home » Web Mining Course

Web Mining Course Unit on Web Log Analysis

Here are the teaching modules created by Gregory Piatetsky-Shapiro in 2005 for a unit on Web Mining with the focus on Web Usage Mining and Web Log Analysis.

Web Mining Course Modules

To get the presentations, add www.kdnuggets.com/web_mining_course/ in front of ppt files below
  • Module 1: Introduction to Web Mining
  • Module 2a: Web Server Log
  • Module 2b: Unix tools for web log analysis
  • Module 3a: Hit Analysis
  • Module 3b: Gawk tools for web log analysis<
  • Module 4a: Visit Analysis; Bot or Not?
  • Module 4b: Perl tools for web log analysis
    Basic Perl script for web log parsing (web_log_parse.txt)
  • Module 5: Behavior modeling


Note: Professors using these modules can get answers by contacting Gregory Piatetsky directly at gregory at kdnuggets dot com.


This course uses KDnuggets web server log data for Nov 16, 2005, which was anonymized by replacing the IP address with ipNNNN.TLD, where NNNN is some random number, and TLD is top-level domain, e.g. .com. For unresolved IP addresses, .unr was used.

This data can be downloaded from kdlog.zip (0.6 MB) in www.kdnuggets.com/web_mining_course/ directory.

First 100 log lines are in the unzipped file d100.log in the same directory.


This project is a continuation of Data Mining Course project, and was funded by a grant from W. M. Keck Foundation, Los Angeles, CA and Howard Hughes Medical Institute, Chevy Chase, MD, as part of Connecticut College Series of Modules in Emerging Fields.

I am grateful to Gary Parker (Connecticut College) for his encouragement and support and to Anand Rajaraman and Jeffrey Ullman (Stanford) for permission to use part of their "Introduction to Web Mining" lecture.

Data Mining Course modules for 1-semester undergraduate course
Education » online

KDnuggets Home » Web Mining Course