GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data.
Carlos Guestrin and his team at the University of Washington in Seattle have developed a new framework, called
GraphLab, specifically designed for graph-based parallel machine learning. In many cases, GraphLab can process such graphs 20-50X faster than Hadoop MapReduce.
This work inspired Intel to develop a demo of a scalable graph construction library for Hadoop. Hadoop is not good for graph-based machine learning but is well-suited for graph construction.
GraphBuilder only constructs large-scale graphs fast and also offloads many of the complexities of graph construction, including graph formation, cleaning, compression, partitioning, and serialization.
GraphBuilder makes it possible for a Java programmer to build an internet-scale graph for PageRank in about 100 lines of code and a Wikipedia-sized graph for LDA in about 130.
Read more in
GraphBuilder: Revealing hidden structure within Big Data, by Ted Willke, Dec 6, 2012.
Here is the site for GraphBuilder
01.org/graphbuilder/.
GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data. The library offloads many of the complexities of graph construction, such as graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization. It scales using the MapReduce parallel programming model.
|