KDnuggets Home » Web Mining Course » Assignment 3
Latest News



Web Mining Course: Assignment 3 - Hit level analysis



For the KDnuggets 1-day log file kdlog.zip, use unix/gawk or other tools to compute
  1. Breakdown of hits and HTML pages by hour (extra credit for graphing).
  2. Top 20 TLD (top-level domains) by hits and HTML pages. Extra credit for adding country names
  3. Top 20 (most requested) HTML pages
  4. Top 10 external referrer sites (not from direct access or www.kdnuggets.com) by hits; also count direct entry (referrer = "-") hits.
  5. Top 10 IP addresses, including their user agent, by hits, and by HTML pages.
  6. Top 10 most frequently not found pages (status code 404)
What interesting observations can you make ?

Hint: you can use zcat kdlog.zip to manipulate the log without uncompressing it.


KDnuggets Home » Web Mining Course » Assignment 3

Copyright © 2011 KDnuggets.  | SUBSCRIBE to KDnuggets News email  | Tweet Twitter | facebook Facebook | RSS RSS | About KDnuggets