IBM Accelerates Big Data

IBM announced several related technologies in a bid to lead the Big Data Market, including a dramatic 8-25x BLU Acceleration for DB2, an easy-to-use Big Data Platform, and a system for Hadoop.

Gregory Piatetsky, Apr 4, 2013.

IBMIBM made a major announcement on April 3, 2013 at Almaden Research Labs, in front of a room full of analysts, broadcast on livestream, and accompanied by a lively chat around #BigDataMgmt tag. Almaden has a long history of innovation, including being the birthplace of SQL and (likely) the first data mining algorithm "Apriori", developed in 1990s by Rakesh Agrawal and Ramakrishnan Srikant.

This time IBM announced a set of technologies aimed at significantly speeding up Big Data processing while at the same time simplifying data access and use.

IBM Big Data Technologies

The key components included

  • DB2 with BLU Acceleration, which is reported to be 8-25x faster on reporting and analytics, AND also require no indexes, tuning, or schema changes.
  • Big Data Platform with uses standard SQL for access to data and offers much faster streams operations.
  • PureData for Hadoop, which offers 8x faster deployment and has a built-in analytics accelerator.

The major innovations enabling BLU Acceleration are:

  • Dynamic In-Memory columnar processing with dynamic movement of data from storage
  • Patented compression technique that preserves order so that the data can be used/queried without decompressing
  • Parallel Vector Processing with Multi-core and SIMD parallelism
  • Data Skipping, which uses columnar data to avoid processing of irrelevant data

Here is an example IBM gave:

The System: 32 cores, 1TB memory, 10TB table, 100 columns, 10 years of data.

The Query: How many "sales" did we have in 2010?

SQL: SELECT COUNT(*) from MYTABLE where YEAR = '2010'

The Result: In seconds or less as each CPU core examines the equivalent of just 8MB of data

10TB query in seconds or less

"Queries that took two days to execute can be done in three minutes" using the BLU accelerator, said IBM's Bob Picciano.

IBM sees 5 key Big Data use cases

  • Enrich your information base with Big Data exploration
  • Improve customer interaction
  • Help reduce risk and prevent fraud
  • Optimize infrastructure and monetize data
  • Gain IT efficiency and scale

but is seeing applications across many industries, including Aerospace & Defense, Automotive, Banking, Chemical & Petroleum Consumer Products, Electronics Energy and Utilities, Government, Healthcare, Insurance, Life Sciences, Media & Entertainment, Retail, Telco, and Travel.

All offerings are available in Q2, except the PureData System for Hadoop, which will start shipping to customers in the second half 2013.

Many leading analysts were in the audience or listened to the Livestream from Almaden and commented on Twitter. Here are some selected tweets for #bigdatamgmt tag

  • David Birmingham @enzeevoice, IBM PureData Hadoop appliance simplifies - uses in-stream FPGAs to assist in data filtration in hardware @BrightlightBI #BigDataMgmt
  • Shawn Rogers @shawnrog #bigdatamgmt my #bigdata V's - Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data
  • Claudia Imhoff @Claudia_Imhoff Les: 4 signature sols: Social (social media analytics, Fraud (anti-fraud, waste, abuse), Asset, and Customer (next best offer) #bigdatamgmt
  • prashant @PrashantKJ968 Big data @ IBM : Volume, Velocity, Variety, Veracity AND VISIONARY: Les Rechan #Bigdatamgmt @IBMPureSystems
  • James Taylor @jamet123 65% of businesses not using big data for business advantage? more like 95% IMO #bigdatamgmt
  • Dan Vesset @DanVesset Prespcriptive advice is sorely missing in most analytics solutions. IBM's Next Best Action is one of them. #bigdatamgmt #decisionmgt
  • Barry Devlin @BarryDevlin System Z have really adopted Netezza / FPGAs - IDAA is maybe only a beginning. As I said back at acquisition time! #bigdatamgmt
  • @matteastwood: Wonder if @IBMWatson will be ready to debate Pres candidates in 2016. Or at least fact check in real time #bigdatamgmt

Additional information will be presented on Apr 30 IBM webcast .

See also