KDnuggets Home » Data Mining Course » Introduction

Data Mining Course: Introduction


Data Mining is one of the hottest fields in Computer Science. Data has been accumulating throughout the computer age in many forms, including database systems, spreadsheets, text files, and recently web pages. These data have been stored on hard drives and temporary storage media. Database programs can query for specific information such as "how many patients are over age 70," but there is potentially much more in the data than such specific information. The real treasure could be some interesting new patterns, that we don't even know that we should ask for, for example, "the best predictor of Alzheimer disease for patients over 70 is the ratio of Tau and Ab42 proteins".

Data mining programs are intended to search through data for hidden relationships and patterns in your data. This is particularly pertinent to marketing companies who want to know what made a specific group of people buy their product. It can also be very important in scientific fields such as medicine where finding correlations in groups of people who are affected by a similar disease could be very helpful. Data mining is needed to make sense and use of the rapidly growing data and is an essential field of the 21st century.

Made possible through a generous grant from the Howard Hughes Medical Institute and the W. M. Keck Foundation to Connecticut College, this CD and website contain a set of modules for a complete 1-semester course in data mining. In addition, there are also modules for individual lectures on data mining in the context of courses on Algorithms, Artificial Intelligence, and Introduction to Computer Science.

The grant provided support for Dr. Piatetsky-Shapiro, one of the leading Data Mining researchers in the world, to spend a concentrated period of time at Connecticut College, co-teaching and developing the course modules. This period was followed by adjustments in the original modules and the development of modules to cover one or two sessions for other courses.

Dr. Piatetsky-Shapiro and 3 computer science faculty members from Connecticut College worked in conjunction with an instructional designer to create these teaching modules. These modules are presented in PowerPoint to facilitate individual modifications and are distributed on CD and via the website, free of charge, to interested professors and instructors.

Data Mining is a good field for computer science students to study. It is both an active area of research and a great field for employment opportunities. It is our hope that schools that do not have faculty with the expertise to teach data mining will now have the opportunity to offer a data mining course or at least be able to cover some part of it in their curriculum.

The main part of this CD and website is a set of modules for a complete 300-level Machine Learning / Data Mining course, with instructional material for nineteen 75-minute classes.

In addition, there is instructional material for

  • a module for a 30-minute segment on Data Mining as part of Introduction to Computer Science class.
  • one or two units on Decision Trees (depending on how much advanced material is covered) as part of a 300-level class on Algorithms, focusing on decision tree algorithms.
  • one or two units on Decision Trees as part of a 300-level class on Artificial Intelligence, focusing on decision tree usage and application.
Happy Discoveries!

Intro for Faculty
Education » online

KDnuggets Home » Data Mining Course » Introduction