SFBayACM talk: GraphLab framework for
Machine Learning in the Cloud
Watch this SFBay ACM presentation by Carlos Guestrin to learn about GraphLab framework, which naturally expresses graph computations key for machine learning on Big Data. On many large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop.
This 2013 presentation at SF Bay ACM
GraphLab: A Distributed Abstraction for Machine Learning in the Cloud
was given by Carlos Guestrin, the leader of the GraphLab project, and the Amazon Professor of Machine Learning at the U. of Washington.
Abstract: Today, machine learning (ML) methods play a central role in industry and science. The growth of the Web and improvements in sensor data collection technology have been rapidly increasing the magnitude and complexity of the ML tasks we must solve. This growth is driving the need for scalable, parallel ML algorithms that can handle "BigData." Unfortunately, designing and implementing efficient parallel ML algorithms is challenging. Existing high-level parallel abstractions such as MapReduce and Pregel are insufficiently expressive to achieve the desired performance, while low-level tools such as MPI are difficult to use, leaving ML experts repeatedly solving the same design challenges.
This talk describes the GraphLab framework, which naturally expresses asynchronous, dynamic graph computations that are key for state-of-the-art ML algorithms. When these algorithms are expressed in our higher-level abstraction, GraphLab will effectively address many of the underlying parallelism challenges, including data distribution, optimized communication, and guaranteeing sequential consistency, a property that is surprisingly important for many ML algorithms. On a variety of large-scale tasks, GraphLab provides 20-100x performance improvements over Hadoop. In recent months, GraphLab has received thousands of downloads, and is being actively used by a number of startups, companies, research labs and universities.