Big Data Innovation Summit 2014: Highlights of Keynote Speeches on Day 2

Highlights from keynote speeches by big data experts from Facebook, RedPoint Global, Quintiles, Samsung, GMU, PayPal, and others on Day 2 of Big Data Innovation Summit 2014 in Santa Clara.

Big Data Innovation Summit 2014 (Apr 9-10, 2014) was organized by Innovation Enterprise and held in Santa Clara, CA. The summit brought together experts from industry and academia for two days of insightful presentations, workshops, discussions, panels and networking. It covered areas including Big Data Innovation, Data Analytics, Hadoop & Open-Source Software, Data Science, Algorithms & Machine Learning, Data Driven Business Decisions and more.

There is so much happening at such conferences that it is impossible to catch all the action. KDnuggets helps you by summarizing the key insights from the keynote sessions at the conference. These concise summaries are both for people who attended the conference but would like to re-visit the key sessions for a deeper understanding and for people who did not attend. Keep checking KDnuggets for next few weeks as we would soon publish exclusive interviews with some of these speakers.

See also: Big Data Innovation Summit 2014 - Highlights of Keynote Speeches on Day 1

Here are highlights from keynote sessions on day 2 (Thu, Apr 10):

Zoe Abrams, Data Scientist at Facebook commenced her talk by emphasizing on her ideas for improving data science infrastructure at Facebook. She stated how the data science infrastructure at Facebook evolved and how important it is to build a culture of sound inference for impactful analysis. She talked about metrics: challenges to figure the right metric, maintain consistency, and provide data quicker.

Wojciech Galuba from Experimentation Tools team at Facebook then took over and talked about moving metrics. He showed how metrics changes with changes in the product. He explained following challenges while moving metrics: many metric dimensions, many dimensions running in parallel, and many teams involved. He concluded by saying that ”Making and moving metrics is not just about providing the data and tools, but about building a culture of sound inference”. Many experiments in parallel George Corugedo, CTO & Co-founder at RedPoint Global initiated his talk by pinpointing the advantages of Hadoop2.0 over Hadoop1.0, specifically how it removes the need for Map Reduce programmers. He explained the current Big Data challenges: (1) skill gap, (2) maturity and data governance, and (3) converting raw data into useful information. RedPoint Global is the first firm to bring a YARN complaint ETL/data quality toolset to the market. He discussed about the key features of RedPoint Data Management on Hadoop. He concluded the talk by describing which kind of companies should care about Hadoop 2.0 and how RedPoint solutions meet their needs.

Gary Shorter, Director, Data Science at Quintiles provided an overview of the current progress in Data Science for Healthcare and what lies ahead. The current benchmarks state that in order for a firm to stay competitive, it should already be able to:

  1. collect, clean and integrate unstructured data from multiple sources,
  2. make it scalable and secure,
  3. derive basic insights from data at a reasonable speed, and
  4. solve complex problems through advanced analytics.

Next, in order to cross “the chasm” firms should be:

  1. integrating data globally,
  2. achieving analytics results within seconds,
  3. obtaining the right insights, and
  4. benefiting through great visualization.

Finally, the ultimate goal should be to deliver personalized health through real-time insights across multiple dimensions. Data science beyond healthcare Arun Jagatheesan, Head of Intelligent Storage, Samsung talked about wearable computing and speech recognition. Discussing about automatic speech recognition he mentioned three models: Acoustic, Pronunciation & Language and Machine Learning models (with focus on neural networks). He discussed Cloud based ASR such as Siri and various concerns such as latency. He introduced Hierarchical Speech Recognition (HSR), and described how Samsung was able to save power and increase efficiency through HSR.

Prof. Kirk Borne, Professor at George Mason University shared his thoughts on using Big Data for data-driven discovery and decision support. He disagreed with the Wikipedia definition of Big Data and suggested that Big Data should rather be defined as “Everything, Quantified and Tracked!”. After listing all the Big Data characteristics (the V’s: Volume, Variety, Velocity, Veracity, Validity, Value, Variability, Venue, Vocabulary), he focused on Veracity and Variety to explain how Big Data can be used as an experimentation-bed for valuable research. He explained the “Decision Science-as-a-Service” model through examples from his work on Astrophysics as well as common life scenarios. What is big data good for Duru Ahanto, Data Scientist at Yahoo spoke about how Yahoo manages teams going from Data to Knowledge to Insights and follow some organizing principles. From organization perspective data mining engineers are caretakers, data analysts are explorers and insight analysts are connection makers. He discussed driving efficiency in organization of data science teams by going through permutations of generalists vs specialists, centralized vs de-centralized, and how to best address teams in each model. He stated that at Yahoo, focus is on flexibility.

Moises Nascimento, Chief Architect at PayPal gave the closing keynote making a point that though many firms are now adopting many of the new data technologies like Hadoop and NoSQL, most of their existing data sources and toolsets still provide value – so there is value in leveraging all data sources. He highlighted that data manipulation is best handled at the system level, while data analysis is better managed at the enterprise level. Data revolution The summit also featured Big Data Innovation Awards Ceremony. The winners were:

Big Data Innovation AwardsTalksum Inc., a leader in high-speed data processing and management solutions was announced it has won the Big Data Start Up Award given at the Big Data Innovation Summit, Santa Clara, CA. The award was based on the company’s mission and its flagship product, the Talksum Data Stream Router (TDSR), which helps IT and data centers manage Big Data for applications that require massive amounts of processing.

Big Data Innovation Summit awarded Cisco with the Big Data Project Award, and YarcData, a Cray company, with the Big Data Tech Provider Award.