KDnuggets Home » News » 2014 » Dec » Opinions, Interviews, Reports » KDnuggets Interview: Paul Zikopoulos, IBM on Why Big Data needs Polyglots ( 16:n16 )

KDnuggets Interview: Paul Zikopoulos, IBM on Why Big Data needs Polyglots


We discuss why not to focus on a single technology in Big Data, prevalent myths, what IBM & Twitter partnership means for the world, and current state of data governance.



Paul ZikopoulosPaul C. Zikopoulos, B.A., M.B.A., is the Vice President of Technical Sales for IBM’s Information Management division and additionally leads its World Wide Competitive Database and Big Data teams. Paul is an award winning writer and speaker with more than 20 years of experience in Information Management and is seen as a global expert in Big Data and Analytic technologies. Independent groups often recognize Paul as a thought leader with nominations to SAP’s “Top 50 Big Data Twitter Influencers”, Big Data Republic’s “Most Influential”, Onalytica’s “Top 100”, and Analytics Week “Thought Leader in Big Data and Analytics” lists. Big Data Made Simple noted him as a “Top 200 Big Data Thought Leaders on Twitter” and Technopedia listed him one of its “Big Data Experts to Follow”.

Paul has written more than 350 magazine articles and 19 books, some of which include “Big Data Beyond the Hype”, “Hadoop for Dummies”, “Harness the Power of Big Data” and more.

Here is my interview with him:

Anmol Rajpurohit: Q1. In your keynote at Strata + Hadoop World 2014, you encouraged people to think of Big Data environment as "polyglot". Can you explain that statement and provide a few examples?

Paul Zikopoulos: A person that is polyglot can speak multiple languages - so when I talked about a polyglot analytics architecture, I'm was talking about multiple Polyglottechnologies. Too often people get caught up in the hype and think Hadoop = Big Data. But just take a look at what's going on with Spark - I mean, is anyone even talking about MapReduce anymore? In a polyglot environment, I would have other technologies as well; for example a graph store such as Titan or perhaps a document store such as Cloudant. Of course, the RDBMS is not going away and it frustrates me when I see over-zealous IT folks act in such a manner. Now add to this things like Swift Object Stores and so on.

AR: Q2. What are the most common pain points that you observe in the Big Data implementations across enterprises? Any myths that you have come across?

PZ: Lots of myths out there. I alluded to one earlier, “Big Data is solely Hadoop”. Others include “Big Data means lots of data” - it really mean more data than you are used to, and perhaps different kinds and the speed of accumulation.

I think “NoSQL is death to SQL” is a great myth that I watch - and ironic considering the biggest thing to hit the NoSQL world these days is SQL.

Big Data doesn't mean death to the RDBMS, that's an unfortunate one that I feel I still have to displace. Finally, Big Data isn't just social sentiment. There are so many use cases, from machine data (what I like to call data exhaust) to image and feature extraction...it's just fun to talk about social.

Pain points? I've alluded to some of them already, but I will toss skills out there. Look, my clients are not all LinkedIn and Facebooks - they don't have the budgets to hire big-data-mythsPhDs in Math and Java programming. Even some clients I work with that are well travelled on BigData lose sight of the fact that the true value comes when you move Big Data from the privileged few to the empowered many. So skills. You're going to need varying BigData skills, from maths, to visualizations, to business communication around the topic. I find most companies, whether they want to admit it or not, need to train internally as well as add to the bench from external sources.

AR: Q3. What were the key aspirations behind the recently announced partnership between IBM and Twitter? What do you foresee as the most significant value of leveraging social data for business decisions?

PZ: Well, you look at Twitter and it truly is the 'pulse of the planet'. One of the things I said in my keynote is that Big Data without analytics is...well...just twitter_ibma bunch of data. So, I think it's a natural evolution to bring the pulse of the planet with the most advanced analytics platform on the planet. I mean step back for a moment and look beyond Big Data today. IBM is doing the hard Big Data work of tomorrow with its Watson cognitive computing. I mean, eWeek named IBM as one of the top 10 technology companies that is proving innovation isn't dead (Nest, Tesla, Amazon in that group, no other traditional or new age Hadoop vendor was in there). So it's a natural marriage if you will. We are going to offer the ability in both PaaS and SaaS styles to interact with this Twitter data, for free(depending on the service); as well as deliver expert consulting around the use of Twitter data to bolster that business.

The most significant value is really about attribute fullness in my humble opinion. I mean sure, there are things like reputational risk assessment and monetizable intent, but what about using Twitter posts to classify folks, understand location, and those kinds of things. IBM has a technology called BigMatch - it's the world's (as far as I know) first probabilistic native Hadoop matching engine. Clients ahead of the industry are using it to match system of record with Twitter system of engagement for better attribute fullness.

AR: Q4. What are your thoughts on the current state of data governance in the information management infrastructure at enterprises? What practices would you recommend for data governance amid the increasing diversity and complexity of the information management infrastructure?

PZ: This is a broad topic, so let me narrow it on Hadoop. Enterprises are running with scissors. Look,

I don't Data Governancecare if the data resides in RDBMS, HDFS, or Microsoft Access, Personally Identifiable Data (PII) is PII data. So treat your BigData the same way you treated sensitive data before someone coined this term.

So things like activity monitoring for audit, masking, meta-data ... these things all matter. And if you look at pending legislation around consumer protectionism (for example the pending "Right to be forgotten" legislation in Europe) it's going to get even more important. But most important, at the end of the day, consumers are going to punish you if you mishandle their 'stuff'. We've seen that a number of times.

Second and last part of the interview.

Related: