KDnuggets Data Mining Community's Top Resource since 1997
for Data Mining and Analytics Software, Jobs, Consulting, Courses, and more
 
advanced search              help


You are here: KDnuggets Home » Web Mining Course » Assignment 3

Web Mining Course: Assignment 3 - Hit level analysis

For the KDnuggets 1-day log file kdlog.zip, use unix/gawk or other tools to compute
  1. Breakdown of hits and HTML pages by hour (extra credit for graphing).
  2. Top 20 TLD (top-level domains) by hits and HTML pages. Extra credit for adding country names
  3. Top 20 (most requested) HTML pages
  4. Top 10 external referrer sites (not from direct access or www.kdnuggets.com) by hits; also count direct entry (referrer = "-") hits.
  5. Top 10 IP addresses, including their user agent, by hits, and by HTML pages.
  6. Top 10 most frequently not found pages (status code 404)
What interesting observations can you make ?

Hint: you can use zcat kdlog.zip to manipulate the log without uncompressing it.

Current KDnuggets News

Follow KDnuggets on Twitter

SUBSCRIBE
Subscribe to KDnuggets News, the leading data mining and analytics newsletter.

Get the latest news, software, jobs, courses, and more (free).



You are here: KDnuggets Home » Web Mining Course » Assignment 3

Copyright © 2009 KDnuggets.  | SUBSCRIBE to KDnuggets News (free)  | About KDnuggets | Contact us