KDnuggets Data Mining Community's Top Resource since 1997
for Data Mining and Analytics Software, Jobs, Consulting, Courses, and more
 
advanced search              help


You are here: KDnuggets Home » Web Mining Course » Assignment 4

Web Mining Course: Assignment 4 - Visit level analysis

Download web_log_parse.txt, change the file extension to .pl and get it to work.

Modify it to separate the log file into visits. Assume initially that all hits from the same IP on the same day belong to the same visit. For extra credit add a parameter that would equal the largest interval between primary page visits.

Apply to d100.log file, and compute for each visit

  1. Total number of hits
  2. number of successful (code 200 or 304) GETs
  3. number of requests with 404 (not found) status code
  4. visit start (as HHMMSS)
  5. visit length (in seconds)
  6. visit agent (assume that the user agent is the same and take it from the first request).

Write the visit information to a tab separated file d100.log.visits

For verification, also print to the screen

  • the total counts of hits,
  • successful GETs,
  • 404 requests,

Verify the total counts obtained from the perl program using Unix tools.

What interesting observations can you make ?

Current KDnuggets News

Follow KDnuggets on Twitter

SUBSCRIBE
Subscribe to KDnuggets News, the leading data mining and analytics newsletter.

Get the latest news, software, jobs, courses, and more (free).



You are here: KDnuggets Home » Web Mining Course » Assignment 4

Copyright © 2009 KDnuggets.  | SUBSCRIBE to KDnuggets News (free)  | About KDnuggets | Contact us