KDnuggets Home » Web Mining Course » Assignment 3

Web Mining Course: Assignment 3 - Hit level analysis


 
  
For the KDnuggets 1-day log file kdlog.zip, use unix/gawk or other tools to compute
  1. Breakdown of hits and HTML pages by hour (extra credit for graphing).
  2. Top 20 TLD (top-level domains) by hits and HTML pages. Extra credit for adding country names
  3. Top 20 (most requested) HTML pages
  4. Top 10 external referrer sites (not from direct access or www.kdnuggets.com) by hits; also count direct entry (referrer = "-") hits.
  5. Top 10 IP addresses, including their user agent, by hits, and by HTML pages.
  6. Top 10 most frequently not found pages (status code 404)
What interesting observations can you make ?

Hint: you can use zcat kdlog.zip to manipulate the log without uncompressing it.


KDnuggets Home » Web Mining Course » Assignment 3