KDnuggets Interview: Paul Zikopoulos, IBM on Why Big Data needs Polyglots
Tags: Big Data, Hadoop, IBM, Interview, MapReduce, NoSQL, Paul Zikopoulos, Social Analytics, Twitter
We discuss why not to focus on a single technology in Big Data, prevalent myths, what IBM & Twitter partnership means for the world, and current state of data governance.

Paul has written more than 350 magazine articles and 19 books, some of which include “Big Data Beyond the Hype”, “Hadoop for Dummies”, “Harness the Power of Big Data” and more.
Here is my interview with him:
Anmol Rajpurohit: Q1. In your keynote at Strata + Hadoop World 2014, you encouraged people to think of Big Data environment as "polyglot". Can you explain that statement and provide a few examples?
Paul Zikopoulos: A person that is polyglot can speak multiple languages - so when I talked about a polyglot analytics architecture, I'm was talking about multiple

AR: Q2. What are the most common pain points that you observe in the Big Data implementations across enterprises? Any myths that you have come across?
PZ: Lots of myths out there. I alluded to one earlier, “Big Data is solely Hadoop”. Others include “Big Data means lots of data” - it really mean more data than you are used to, and perhaps different kinds and the speed of accumulation.
I think “NoSQL is death to SQL” is a great myth that I watch - and ironic considering the biggest thing to hit the NoSQL world these days is SQL.
Big Data doesn't mean death to the RDBMS, that's an unfortunate one that I feel I still have to displace. Finally, Big Data isn't just social sentiment. There are so many use cases, from machine data (what I like to call data exhaust) to image and feature extraction...it's just fun to talk about social.
Pain points? I've alluded to some of them already, but I will toss skills out there. Look, my clients are not all LinkedIn and Facebooks - they don't have the budgets to hire

AR: Q3. What were the key aspirations behind the recently announced partnership between IBM and Twitter? What do you foresee as the most significant value of leveraging social data for business decisions?
PZ: Well, you look at Twitter and it truly is the 'pulse of the planet'. One of the things I said in my keynote is that Big Data without analytics is...well...just

The most significant value is really about attribute fullness in my humble opinion. I mean sure, there are things like reputational risk assessment and monetizable intent, but what about using Twitter posts to classify folks, understand location, and those kinds of things. IBM has a technology called BigMatch - it's the world's (as far as I know) first probabilistic native Hadoop matching engine. Clients ahead of the industry are using it to match system of record with Twitter system of engagement for better attribute fullness.
AR: Q4. What are your thoughts on the current state of data governance in the information management infrastructure at enterprises? What practices would you recommend for data governance amid the increasing diversity and complexity of the information management infrastructure?
PZ: This is a broad topic, so let me narrow it on Hadoop. Enterprises are running with scissors. Look,
I don'tcare if the data resides in RDBMS, HDFS, or Microsoft Access, Personally Identifiable Data (PII) is PII data. So treat your BigData the same way you treated sensitive data before someone coined this term.
So things like activity monitoring for audit, masking, meta-data ... these things all matter. And if you look at pending legislation around consumer protectionism (for example the pending "Right to be forgotten" legislation in Europe) it's going to get even more important. But most important, at the end of the day, consumers are going to punish you if you mishandle their 'stuff'. We've seen that a number of times.
Second and last part of the interview.
Related:
Top Stories Past 30 Days
|
|