Coursera / Stanford Mining Massive Datasets MOOC, JanMar 2015
Tags: Anand Rajaraman, Coursera, Data Science Education, Jeff Ullman, Jure Leskovec, Mining Massive Datasets, MOOC, Stanford
Don't miss! Top Stanford researchers teach efficient and scalable methods for extracting models and other information from very large amounts of data. Next session of this great course starts Jan 31 on Coursera and is free.
Mining Massive Datasets on Coursera
www.coursera.org/course/mmds
Next session:
Jan 31  Mar 24, 2015
Instructors
About the Course
We introduce the student to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. Students will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes.
We'll cover localitysensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair. When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches. Many other largescale algorithms are covered as well, as outlined in the course syllabus.
Suggested Readings
There is a free book "Mining of Massive Datasets, by Leskovec, Rajaraman, and Ullman (who are the instructors for this course). You can download it at www.mmds.org/
Hardcopies can be purchased from Cambridge Univ. Press.
Enroll at www.coursera.org/course/mmds
www.coursera.org/course/mmds
Next session:
Jan 31  Mar 24, 2015
Instructors
 Jure Leskovec, Stanford University
 Anand Rajaraman, Stanford University
 Jeff Ullman, Stanford University
About the Course
We introduce the student to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. Students will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes.
We'll cover localitysensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair. When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches. Many other largescale algorithms are covered as well, as outlined in the course syllabus.
Suggested Readings
There is a free book "Mining of Massive Datasets, by Leskovec, Rajaraman, and Ullman (who are the instructors for this course). You can download it at www.mmds.org/
Hardcopies can be purchased from Cambridge Univ. Press.
Enroll at www.coursera.org/course/mmds
Top Stories Past 30 Days

