My report: IE Big Data Innovation Summit, Boston – Day 1

I report on the highlights from the first day of IE Big Data Innovation Summit in Boston, including a brilliant presentation by Stephen Wolfram how Wolfram Alpha makes the knowledge computable, and what he told me about Singularity.

By Gregory Piatetsky, Sep 13, 2013.

On Sep 12, I attended the first day of
IE Big Data Innovation Summit BostonIE Big Data Innovation Summit in Boston, MA.

There were lots of people (~800 have registered), many exhibitors, good energy and excitement! Judging by the excellent food and hotel, #BigData is doing great!

Most of the exhibitors were on the Data/ETL side of Big Data, with few analytics-oriented companies - SAS was a notable exception.

For me, the highlight of the conference was the presentation by the brilliant Stephen Wolfram, who took the attendees on a tour of Wolfram Alpha, which already can understand and compute a lot of the world's knowledge. See more below.

Unfortunately, the young Innovation Enterprise team did not have session chairs for their morning speakers, so naturally the speakers took longer, messing up the entire schedule. I am sure they will correct this for the next conference.

Here is my report with @kdnuggets tweets and selected additional tweets for #BigDBN hashtag.

Big Data Platform at the World's Largest Fan-to-Fan ticket Marketplace, by Sastry Malladi, StubHub.

The speaker presented a good architecture which used many parts of Hadoop-based stack. He said that the main reason that traditional Enterprise Data Warehouse (EDW) is not replaced is cultural: replacing traditional EDW with HBase/Hadoop technology requires cultural shift.

In-Memory Computing Will Change Everything You Know About Business, GridGain.

  • Memory gets cheaper at 30%/yr - no economical reason NOT to do in-memory computing
  • Gartner says RAM is a new disk, disk is a new tape
  • Myth #1 - In-memory computing is too expensive. Actually, 1TB DRAM cluster ~ $25K in 2013
  • Myth #2 - In-memory computing not durable. Actually, there is durable backup, disk storage
  • 99% of operational datasets are less than 10 TB, suitable for in-memory computing
  • Myth #4 In-memory computing is for NASA only. Actually speed matters for dealing with #BigData for most businesses
  • In-memory DB is used today. Streaming data will REQUIRE in-memory computing tomorrow
  • Hadoop clusters can get 10x faster with in-memory implementation of HDFS

Big Data Innovations and Applications at NASA, by Nikunj Oza

  • Current "exceedance" anomaly detection in aviation checks variables exceeding norms, cannot find unknown anomalies
  • Multiple Kernel Anomaly Detection algorithm (MKAD) goes from data space to similarity space to find anomalous flights
  • NASA DASHlink, a web-based collaboration tool for those interested in data mining and systems health
  • @StaciaMisner: Data scientist at NASA says #bigdata is misnomer because volume is least difficult problem to solve. Other V's are real challenge.

Monetizing Big Data with Analytics by Jeff Veis, Vice President, HP Autonomy

  • Augmented Reality on mobile devices will go from Sci-Fi to reality, to reach $600 B by 2016, all #BigData-driven
  • 90% of digital content created by 2015 will be of mixed data types
  • By 2030 there will be 1 Trillion sensors, one sensor every 10 square feet #BigData
  • HAVEn #BigData Platform combines Hadoop, Autonomy, Vertica, Enterprise security, nApps
  • Processing human informaton is difficult: it is made up of ideas, which don't match as easily as data
  • can search video as easily as text, while a human fatigues after ~17 minutes of watching surveillance video

Solving the Mysteries of the Universe with Big Data, by Sverre Jarp, Chief Technology Officer, CERN

  • LHC - largest machine & fastest racetrack in the world - protons run at 99.9999991% of speed of light
  • CERN produces the most data of any scientific experiment, with 150 million sensors give data 40 million times/sec #BigData
  • CERN LHC has huge Signal/Noise ratio: 10^-13 (10^-9 offline), on 30 PetaBytes of data in 2012 #BigData
  • #BigData management & Analytics require a solid organizational structure at all levels

Are You Ready for the New Era of Computing and Big Data?, by Phil Francisco, VP, Product Management & Product Marketing IBM Big Data

  • not changing in the era of #BigData is the same as losing
  • Key trends driving #BigData: Cloud computing, Internet of things, Social media, Mobile
  • IBM wants to add 4th "V" (Veracity) to #BigData. Agree with @Doug_Laney "Veracity" not specific to bigness.
    Gartner Analyst Doug Laney, who came up with the original 3Vs of Big Data, commented:
    @Doug_Laney: Gartner's bigdata 3Vs are 12+yo. My orig piece: . Jump in, the water's still warm. #BIGDBN

IBM's Phil Francisco mentioned 5 compelling #BigData use cases:

  1. Big Data Exploration - 99% time reduction needed for interactive analysis
  2. Enhanced 360 degrees view of the Customer - optimize every customer interaction
  3. Operations Analysis on machine data for greater efficiency, new oppys to monetize data
  4. Data Warehouse augmentation, with Hadoop/etc to deliver more value while reducing cost
  5. Security & Intelligence, face recognition, cyberattack detection, fraud detection
  • IBM InfoSphere BigInsights, a Hadoop distribution: from Quickstart (free), to Enterpise Edition, sold by TB managed
  • IBM BigInsights (Hadoop) is 100% open source compatible, no forks #BigData
  • IBM Big Data University, offers many free courses online on Hadoop, Analytics and more

Computational Knowledge, by Stephen WolframStephen Wolfram, CEO WolframAlpha

Stephen Wolfram gave an awe-inspiring live demo of Wolfram Alpha and talked about their plans to release Wolfram Alpha language and ontology to the world.

  • Wolfram Alpha objective is to make much of the world knowledge computable
  • Wolfram Alpha is about 93-94% successful in understanding natural language queries on the web
  • You can upload data to Wolfram Alpha Pro and ask - what can you tell me about this data?
  • We will soon release Wolfram Alpha computable ontology, a standard way for representing and understanding #BigData
  • Wolfram Language, based on Mathematica, includes all the knowledge & algorithms in a coherent, unified way
  • coming soon - universal deployment of Wolfram Language, can be embedded on any website
  • Wolfram Devicepedia collects data, APIs for all devices to connect to Wolfram platform #BigData

Stephen Wolfram stayed around to answer questions and I asked him if he thought that Wolfram Alpha will eventually develop to reach Singularity, when computers become much smarter than humans?

He said that "already Wolfram Alpha is much better able than humans to answer many questions, especially when they deal with math or facts". He estimated that a third to half of US students use Wolfram Alpha to do their math homework, and they see big homework "waves" when school is in season. Based on this experience, Wolfram is considering a "test design app", which can help come up with calculus tests, but the situation is backwards - why teach students to do something like integrals which computers already do much better?

As for singularity, he said,

"Wolfram Alpha just sits on the network, and does not have arms and legs. It all depends on the intent of the computer, and probably won't happen like Ray [Kurtzweil] imagines. "

One Platform for Big Data, by Tomer Shiran VP, MapR Technologies

  • simple algorithms on #BigData usually outperform complex algorithms on smaller data

Big Data Innovation and Crowdsourcing - Will They Make a Happy Couple?, by Rinat Sergeev, Data Scientist Harvard & NASA Tournament Lab

  • the biggest bottleneck in solving problems with CrowdSourcing is organizational - it requires different thinking

Analytics & the 2012 Obama Victory, by Andrew Claster, Deputy Chief Analytics Officer, Obama for America

  • from History of Analytics & Politics: Romney Backers using Computers (headline from 1968 !! Campaign)
  • in 2012 Elections, Obama for America campaign perfectly predicted electoral votes, just like Nate Silver 538 blog
  • Obama Analytics Cave Rule: Test everything, challenge conventional wisdom (applies to all data science!) #BigDBN

Other interesting tweets for #BigDBN:

  • Chris Surdak @CSurdak: Great presentation by David Mariani... Klout active data set has 1.1 TRILLION rows, with 12 BILLION daily updates! #BIGDBN #HPAutnInfoGov
  • kristinevick @kristinevick: High-performance analytics "billion is the new million" #dataviz #BigDBN
  • Scott Golder @redlog: "@netflix is a log-generating company that also streams movies." #BIGDBN
  • Prasun Sinha @psinha: Intelligence is like justice. Delayed=Denied. Real-time #bigdata analytics becoming more important. #bigdbn