INFORMS The Business of Big Data 2014: Day 1 Highlights

Highlights from the presentations by Big Data technology practitioners from Teradata, Booz Allen Hamilton, Databricks and during INFORMS The Business of Big Data in San Jose.

INFORMS Conference 2014, held on June 22-24, 2014 at San Jose Convention Center, focused on "The Business of Big Data" and educated the attendees on the best way to prepare themselves for the growing opportunities in the field of Big Data. The conference was a great opportunity for learning as well as networking with the leaders from industry and academia through a seamless exchange of information, ideas and perspectives. The impressive line-up of speakers, which INFORMS The Business of Big Data 2014included Big Data leaders from various industries, shared the best practices through real-world case studies and tutorials on a wide variety of topics such as getting from data discovery to return on investment and real business value, bridging the gap between decision makers, IT managers, and analytics professionals, etc. The conference program included keynote sessions, presentations across 3 tracks (Big Data Case Studies, Big Data 101 and Emerging Trends in Big Data), technology workshops, panel discussions and exhibits.

Here is a summary of the key takeaways from selected talks on Day 1:

Bill FranksBill Franks, Chief Analytics Officer, Teradata Corporation kicked off the event with keynote speech titled “Putting Big Data To Work”. He said that today one can’t avoid being exposed to discussion around big data and its analysis. However, the downside of this attention is that there is a lot of hype and misinformation in the marketplace. Many companies are also confused about how to get started, what actions to take and what pitfalls to avoid. Based on his popular book “Taming The Big Data Tidal Wave”, he addressed many organizational and cultural points that must be considered while taking big data initiatives.

He insisted that there is indispensable need of augmenting traditional analytics with new approaches. The data science team should constantly expand and evolve. Building team that knows what one needs in total is very essential. He argued about external resources saying that we should leverage external resources optimally. One can outsource tactical execution, but should not hand off analytic strategy and design. Talking about organizing analytics team, he said “Unlike many other common corporate teams, there is no standard model for organizing analytics teams.” He recommended a hybrid model along with a center of excellence for mature organizations.

He strongly suggested to leverage analytics in diverse ways by performing discovery analysis alongside confirmatory analysis to maximize profits. He said it’s high time to move IT from serving to enabling, giving example of traditional server prepared yogurt cups to model self-prepared yogurt cups. He also gave this mantra:

To succeed with big data, start small.

Brian KellerBrian Keller, Data Scientist from Booz Allen Hamilton talked about “Getting Started with Hadoop and Big Data”. He started with mentioning that big data analytics can be a very daunting task because of the complexities of technologies and the fast evolution of the software ecosystem.

He exposed the attendees to established open source big data technologies. He also discussed practical methods to prototype and develop big data analytics without requiring deep knowledge of software development or distributed computing. Giving an overview of Hadoop and Map Reduce, he discussed alternatives to writing Java code to develop analytics on Hadoop. He also mentioned how one can get started with big data technologies today.

Ion StoicaProf. Ion Stoica, CEO, Databricks and CTO, Conviva delivered a talk titled “Taming Big Data with Berkeley Data Analytics Stack (BDAS)”. Today’s data analytics tools are slow in answering even simple queries, as they typically require to sift through huge amounts of data stored on disk, and are even less suitable for complex computations, such as machine learning algorithms.

He discussed today’s state-of-art analytics stack and mentioned need to manage the following three stacks as biggest challenge: Interactive, Batch and Streaming. To address some other challenges as well, his team is developing BDAS, an open source data analytics stack that provides interactive response times for complex computations on massive data. He briefly discussed Apache Sparks and merits such as fast, easy and generic. Spark also unifies real-time and historical analytics. He briefly discussed unification of graph processing and ETL.

Sam SavageSam L. Savage, Executive Director, Probability talked about SIPmath Modeler Tools. The Stochastic Information Packet or SIP allows uncertainties to be used in interactive calculations. The Open SIPmath standard facilitates uncertainties to be communicated as big data for driving interactive simulation in native excel and other environments without add-in software. He demonstrated some SIPmath Modeler Tools which facilitate generation of such models in Excel, and also how to import and export results from Crystal Ball, Risk Solver and Matlab to leverage those packages.

Highlights from Day 2.