Big Data for Executives 2014: Day 1 Highlights

Highlights from the presentations by Big Data experts from Sears Holdings, PWC, Oracle, Altamira, Tesora on Day 1 of Big Data for Executives 2014.

Big Data for ExecutivesBig Data for Executives 2014 2-day event was organized by Global Big Data Conference on May 5-6, 2014 specifically for the decision-making executives, and it addressed various questions everyone is asking such as: How can I leverage data to make better decisions across the enterprise? How well-positioned is my organization to take advantage of Big Data? What do I really need to get started? The focus of the event was on the application of Big Data rather than on the clusters, data warehouses, and coding that support it.

To help its readers succeed in their Analytics pursuits, KDnuggets provides concise summaries from selected talks at the event. These concise, takeaway-oriented summaries are designed for both – people who attended the event but would like to re-visit the key talks for a deeper understanding and people who could not attend the event. As you go through it, for any session that you find interesting, check KDnuggets as we would soon publish exclusive interviews with some of these speakers.

Here are highlights from selected talks on day 1(Thursday, May 5): Ankur Gupta

Ankur Gupta, IT Director – Big Data, Sears Holdings  delivered an interesting talk on "Hadoop-Enabled Business Intelligence Use Cases".

He started with giving the following facts from a recent Wikibon survey:
  • 46% of Big Data practitioners report that they have only realized partial value from their Big Data deployments
  • 2% declared their Big Data deployments total failures, with no value achieved

According to Wikibon, there are three compelling reasons for this struggle to achieve maximum business value from big data:
  1. A lack of skilled Big Data practitioners
  2. “Raw” and relatively immature technology
  3. A lack of compelling business use case

In order to solve this crucial problem, Ankur suggested the following strategy:
  • Bring IT and Business together
  • Understand how Hadoop will fit into your environment
  • See the end results first before you start your journey
  • Define realistic success criteria and discover your big data use case

Introducing Sears as a cutting edge integrated retailer he discussed various use cases such as product perception, brand sentiment analysis, behavioral and predictive analytics on network data, and real-time inventory management. He concluded the talk stressing critical need of data governance and data hub across the enterprise.

Amrith KumarAmrith Kumar, CTO, Tesora delivered a talk on “An industry in flux - the Shifting Vendor Landscape”.  He introduced typical solution architecture as having the following components (stacked up left to right): Platform, Database Management System, Application, End-User Device and End User. Next, he explained various types of components that can be used:
  • Platform: Public Cloud, Private Cloud, Dedicated Hardware, etc.
  • Database Management System: Relational, Non-Relational, SQL, NoSQL, etc.
  • Application: Vertical Specific, Custom, In-house, etc.
  • End-User Devices: PC, Tablet, Phone, Hardcopy, etc.

Discussing about platform in detail he said storage, memory, networking, computing and cloud technology are improving rapidly each day. He described End User’s dilemma as user expects more than what specialized solution can provide. He discussed the following business problems:
  • Provisioning of data databases takes too long
  • Data is not actionable / monetize-able
  • Database server not scalable
  • Administration being too cumbersome

Giving DBaaS (database-as-a-service) as a solution, he emphasized that it improves usability of databases and handles “administrivia” so that application user can focus on innovation. Talking about DaaS(Data-as-a-Service), he pointed out that such simplification (through self-service provisioning) is desperately needed and DBaaS helps simplify the work-flow.

Anand S RaoAnand S Rao, Partner, PWC talked about “Delivering Insights to the right person, at the right time, in the right manner”.  He started by describing data evolution (exponential growth of data), accelerating technology (more data, reduced storage costs, more apps, more analytics), and the five dimensions of Big Data (data, technology, analytics, decisions and mindset). Talking about delivering insights, he mentioned that data can be of different types: structured, semi-structured, batch, real-time, near real-time, etc. There are many analytic techniques available today that can be used to generate the insights. Transforming the data in a structured form from its raw, unstructured form is the key. He explained this through some use cases.

C ScyphersCharles Spycher, Big Data Architect, Oracle commenced his presentation by talking about the enormous amount of data that we all generate every single second. With exponential increase in number of smart devices, which is projected to be 12.5 billion by 2020, there are still only 12% of executives who really feel that they understand the impact data will have on their organizations.  In order to shrink the gap between data production and data usage we would have to capture and analyze massive data volume that too securely with unification of data platforms. During analysis of all data, Big Data solutions capturing sensitive information must be protected and audited.  Founding member of Apache Sentry are working on fine-grained authorization for Hadoop. Introducing Oracle Unified Data Platform, he explained some features such as querying any data with SQL, Metadata Integration, Intelligent Query Optimization, etc.

Charlie GreenbackerCharlie Greenbacker, Director of Data Science, Altamira talked about how to go for Big Data without the Big Bucks. Talking about Open Source Software (OSS) he mentioned that there is no one best universal tool for everything. There is always a need to adapt, customize and be agile as the problems we solve are constantly changing. Data scientists are constantly bouncing from task to task, like a handyman.  They should use a lot of lead bullets rather than looking for a silver bullet to get the job done. He then displayed the leading OSS tools available for Data Science tasks, found through a survey:
  • Statistical Analysis: R
  • Data Mining: Pandas, Impala, Mahout
  • Machine Learning: Scikit-learn
  • Machine Learning + NLP: Mallet
  • Natural Language Processing: NLTK, Stanford CoreNLP,
  • NLP + Geospatial Analysis: CLAVIN
  • Social Network Analysis: NetworkX, Gephi
  • Data Visualization: D3.js
  • Fusion, Analysis, Visualization: Lumify

He concluded the talk emphasizing that organizations should save their dollars for people (salaries, training, etc.), resources (hardware, AWS, etc.) and proprietary software if no viable OSS alternative exists.

Highlights from day 2 will be made available soon.