SiliconAngle, Klint Finley, Nov 2, 2011
Investigative Analytics: Cloudera Founder Launches New Startup Backed by Eric Schmidt
Cloudera founder Christophe Bisciglia unveiled his new startup
Odiago
this morning, giving to the
business side scoop
to TechCrunch and the technical details to Curt Monash. The company is launching a product called
Wibidata ("we be data")
specializing in data management and what it Monash calls "investigative analytics."
Wibidata is built on Apache Hadoop and HBase, which Cloudera specializes in supporting and developing. Monash defines investigative analytics as "seeking (previously unknown) patterns in data". Monash describes how Wibidata works:
- ALL data pertaining to a single user (or mobile device) is kept in a single, possibly very long, HBase row.
- There are two primary operators in WibiData, Produce and Gather
- Produce operates on single rows ... mainly doing two things. One is serving of data out of WibiData into interactive applications. The other is scoring, classifying, recommending, etc. on individual users (i.e. rows), in line with an analytic model.
- Gather typically operates on all your rows at once, and emits suitable input for a MapReduce Reduce step. It is reasonable to think of Gather as being a key cog in the training of analytic models.
- WibiData takes single-table schema flexibility to an extreme. Not only can different rows in the same table have different associated columns - something that relational systems can in effect also do via NULL values - but schemas can even change over the life of a column. If you have an array-valued cell storing the results of a marketing campaign, and you start recording more data partway through the campaign, then different rows in the table will, in the same column, hold different-sized arrays.
Read more.
See also
TechCrunch: Cloudera Founder Debuts Big Data Management And Analysis Platform WibiData With Backing From Eric Schmidt
TechCruch writes that
The company already has a number of high-profile customers using WibiData, including Wikipedia, Rich Relevance, and Atlassian.