|
Web Mining Course: Assignment 3 - Hit level analysis |
|
For the KDnuggets 1-day log file kdlog.zip, use unix/gawk or other tools to compute
- Breakdown of hits and HTML pages by hour (extra credit for graphing).
- Top 20 TLD (top-level domains) by hits and HTML pages. Extra credit for adding country names
- Top 20 (most requested) HTML pages
- Top 10 external referrer sites (not from direct access or www.kdnuggets.com) by hits; also count direct entry (referrer = "-") hits.
- Top 10 IP addresses, including their user agent, by hits, and by HTML pages.
- Top 10 most frequently not found pages (status code 404)
What interesting observations can you make ?
Hint: you can use zcat kdlog.zip to manipulate the log without uncompressing it.
|
|
|