Big Data Innovation Summit 2014: Highlights of Keynote Speeches on Day 1

Highlights from keynote speeches by big data technology leaders from industry and academia on first day of Big Data Innovation Summit 2014 in Santa Clara.

Big Data Innovation Summit 2014 (Apr 9-10, 2014) was organized by Innovation Enterprise at Santa Clara Convention Center in Santa Clara, CA. The summit brought together experts from industry as well as academia for two days of insightful presentations, workshops, discussions, panels and networking. It covered areas including Big Data Innovation, Data Analytics, Hadoop & Open-Source Software, Data Science, Algorithms & Machine Learning, Data Driven Business Decisions and more.

People attending such conferences would agree that there is so much happening quickly and often simultaneously at conferences that it is almost impossible to catch all the action. KDnuggets helps you by summarizing the key insights from all the keynote sessions at the conference. These concise, takeaway-oriented summaries are designed for both – people who attended the conference but would like to re-visit the key sessions for a deeper understanding and people who could not attend the conference. As you go through it, for any keynote that you find interesting, check KDnuggets as we would soon publish exclusive interviews with some of these speakers.

Here are highlights from keynote sessions on day 1 (Wed Apr 9):

Daniel Austin, Principal Architect at PayPal gave the opening keynote explaining why “Internet of things” should really be the “individual network of things”. He highlighted that the number of devices and their connectivity, availability and partitioning will play a key role in future. During his keynote he shared an interesting fact: “Every person is predicted to generate over 20 Petabytes of data over the course of a lifetime”. Connectivity by 2020 Vasanth Kumar, Principal Data Scientist at Live Nation started by introducing the problem their data science team faced, which was basically to detect anomalies and optimize resources to minimize cost. The data they worked on was extremely bursty with a million requests per minute during peak and consuming data at this rate made it a big data problem. He explained how batch models have demonstrated limited success for the problem at hand, thus prompting the need for adaptive online-learning, and the various trade offs and challenges encountered along the way for such an approach.

Juan Miguel Lavista, Principal Data Scientist at Bing emphasized that most of us know the good part of Big data but we ignore the bad and ugly part. He then explained the bad and the ugly through the following 5 Big Data myths:

  1. What really matters is the size of data (Fact: Keep the focus on problem and not on the size of data. You need enough data to solve your problem, and not any bigger data than that.)
  2. In order to do Data Science or Big Data all one needs is Hadoop (Fact: Hadoop is a great technology, but it is just infrastructure, which is of no value without good data scientists and good processes)
  3. Big data means the end of scientific theories (Fact: Scientific method is still very relevant since correlation is great, but not good enough as it does not equate causality)
  4. Data Scientists are always right (Fact: A lot of research based on observational studies is biased. In order to be right the research needs to be reproducible)
  5. Big Data can solve world hunger (Fact: Not by itself, but Big Data can play a key role in solving world hunger through better forecasting for demand, better weather prediction, etc.)
Data Science and Infrastructure Scenario He also reminded and emphasized on a key data science principle: “Correlation does not imply causation”. He shared a few examples illustrating cases when strong correlation intuitively misled to wrong causation inferences.

Anthony Scriffignano, SVP, Data & Insight at D&B gave his talk on unlocking the value of big data and analyzing how businesses are turning to new and exciting ways of leveraging predictive analytics. He advised the audience “it’s not just about data, it’s not just about the math, it’s the relationships among data which matters.” He demonstrated how data innovations are radically enhancing power of predictive analytics, such as new models for accessing size dimensions. He explained that with the vast amounts of data comes great opportunity and new types of risks.

Harriet Fryman, Director, Market Strategy at IBM mentioned that there are immense possibilities of analyzing all available data and the opportunity of deriving value from it is infinite. She put forward following 5 key points for successful implementation of big data projects:

  1. Build a culture that infuses analytics everywhere
  2. Apply analytics to improve your core competitiveness
  3. Invest in capabilities driven by software
  4. Be proactive about privacy, security and governance
  5. Understand the levers of differentiation you can influence
Big Data & Analytics Journey Lynn Goldstein, Chief Data Officer at New York University talked about Data Governance covering privacy and security. She stated that the root of data governance is accountability and an accountable organization has following three elements:
  1. Commitment to accountability
  2. Policies linked to laws, generally accepted principles, and best practices
  3. Performance mechanisms to ensure responsible decision making

Arijit Sengupta, CEO, BeyondCORE said that knowing the right question to ask is the most difficult part in data analysis. He emphasized that if we have a large number of variables from which we need to select a few to chart out on a graph, it would generate some billions of possible combinations, making it almost impossible to analyze. He referred to the McKinsey Global Institute report to state that there is a large shortage of 140K-190K data analysts and pointed out that the much bigger problem is 1.5 billion business managers who are not data-savvy. He indicated that the future of analytics is simplicity, ubiquity, and actionability; in other words, – Advanced Analytics for All (A3). In his A3 approach, automated algorithms rather than expert Data Scientists conduct the analysis, the results are delivered in a manner business users can understand without Statistics or Computer Science skills, and users can easily overlay human intuition on top of the automated analysis. Problem with n-order statistics Big Data Innovation Summit 2014 - Highlights of Keynote Speeches on Day 2