Interview: Beth Smith, General Manager of the IBM Analytics Platform business, on Analytics, Hadoop, Spark

We discuss coming Analytics surprises, what has changed, Open Source, Hadoop, Apache Spark, Open Data Platform, new analytics roles, IBM resources for analytics educations, and more.



Q4. IBM was one of the founders of the Open Data Platform. How do you see the evolution of Open Data Platform and its role in the Hadoop marketplace?

The Open Data Platform (ODP) is important to enable clients and partners to innovate on Apache Hadoop. It is important to mention that the ODP is not a replacement for Hadoop, rather it is all about simplifying and increasing enterprise adoption of the Apache Hadoop platform by removing the burden of incompatibility and component management.

Today, most businesses are looking for open source based solutions. But, the interesting point is that this interest is not driven by cost - after many years of open source software being in the mainstream, it's clear that "free" software is not truly free. The real benefit of open source software is that it provides a buffer between the business and a single vendor's software. This avoids lock-in, and also ensures that the vendor can't hold its customers hostage. Without an Open Data Platform compliant Hadoop distribution clients risk being locked into solutions that limit their growth and innovation.

We are excited about the accelerating number of organizations joining ODP, the pace, as well as the feedback I have received from many clients around the world, reinforces the fact that ODP addresses an unmet need and broader purpose for Hadoop.

Q5. You mentioned the rise of Apache Spark. Given Spark is so new, why do you think Spark has gained so much interest?

sparkRate of adoption of technology is always driven by need and ease of use. Apache Spark is gaining so much traction in the open source community primarily because of this increased business opportunity of analytics and Spark's ease of use to enable that value. Data Scientists, Data Engineers, Application Developers, and Database Analysts can all work with Spark through its many interfaces including SQL, R, Java, Scala and Python to work with data in the same environment. These APIs are well documented and extendable which make it an excellent framework to do large-scale data manipulations. Spark is a full featured analytics environment with SQL, Machine Learning, Scoring, and Streaming that is lightening fast and highly scalable.

6. IBM has placed a big bet on Analytics. If someone were to ask you what you want IBM to be known for in Analytics, what would it be?

Help individuals transform their industries and professions through the use of data and analytics... so they can better serve their employees, their organizations, and society. IBM has focused on delivering innovation that matters to our clients and the world for the last 100+ years, and our work in analytics is no different.

We intend to deliver leading edge technologies - including platforms, tools, libraries and solutions. From databases to content management, from integration to sense making, from data discovery to cognitive computing, we intend our innovations to be central to every individual working in their own way to create and fuel the Insight Economy.

Q7. What new roles do you see emerging as part of the growth of Analytics?

Data science is one of the most fundamental new professions of the 21st century. A partner told me few weeks ago that there will never be another penniless mathematician, and with the rise in demand for data scientists, he is right. Students and new entrants to the workforce would do well to focus on their data and analytic skills as this will prepare them for a vibrant, dynamic future of opportunities in the Insight Economy.

Data scientists are the core developers of analytics capabilities deployed in applications to make them smarter based on machine learning and related technologies. At IBM, machine learning, cognitive computing and in-memory distributed computing technologies - such as Apache Spark - are all areas of deep investment. It is our strategic focus to help our customers retool their businesses and to help tomorrow's innovators prepare their careers to realize new data-driven opportunities.

Q8. Which industries are the leaders in adoption of analytics and which ones are lagging and why?

Financial services, telecommunications and other service-intensive industries were the first to truly re-imagine the way they can leverage data in this new application of analytics. We have also seen growth in adoption by market professionals for digital engagement, human resources in employee engagement, health care providers for better patient care, cities for citizen services, and manufacturers for warranty and optimizing operations.

Those that are lagging are ones that limit their scope of analytics to solely providing monitoring of sales, finance, HR as a way to manage their business. They see the least value and will increasingly lag the companies that are putting analytics into the fabric of how their business runs.

Q9. Where do you see the most potential in successful adoption of analytics the next 5 years?

As I mentioned prior, the large corpus of data will be the new competitive advantage - those that use it will be able to gain insights that others can't. Therefore, the most potential comes from re-imagining of how things can be done that simply weren't possible before without Analytics. New businesses emerge based on analytics, like Uber and AirBnB. New business models emerge based on analytics like insurance companies partnering with car manufacturers for safer driving. New conveniences for all of us, like smart thermometers, smart meters, data-driven couponing, intelligent cars.

We now have the ability to apply insight where none was possible before. For example, advances like Watson. Doctors have teamed up with Watson to delve deep into medical information to quickly and safely derive diagnoses and help identify the best treatment plans from clinical studies, medical publications and more!

Q10. Broader adoption of Analytics demands more Data Scientists and Analysts than are available. How is IBM helping bridge the gap?

First, we help with technology. Whether it's having the largest team in the industry focused on Spark or it's providing tools like Watson Analytics that put the power of sophisticated visual, predictive and cognitive analytics directly into the hands of any individual so they don't need to be a data scientist or analyst to use their domain knowledge to bring analytic insight to their world.

Second, we provide a rich set of analytics based solutions that combine analytics and domain knowledge to jump start analytics in organizations. And we provide a broad set of data and analytics services in the IBM Cloud marketplace as a single online destination that serves as the digital front door to cloud innovation, bringing together IBM's capabilities-as-a-service with those of partners and third-party vendors with the resiliency and security enterprises expect.

Third, we recognize that skills are critical to success. We are the largest contributor to bigdatauniversity.com, an organization that provides freely available courses on data science and big data. Just recently, we published "Spark Fundamentals" on bigdatauniversity.com. To date, over 260,000 registrants have taken advantage of bigdatauniversity.com to build their skills. But, it's not only about technical skills, we also partner with Chief Data Officers to shape their analytics strategy and data governance.

Q11. What do you like to do when away from computers? What is a recent book that you read and liked?

People who know me know that I like disruption and change, and that I gravitate to people with ideas that make things happen. I recently read "The New Kingmakers" by Stephen O'Grady, which makes you think about the role of the developer and the importance that role fills. In my opinion, today's developer is that person - the person who will make things happen. There is a quote in the book that struck me -
"The CIO is the last to know" which tools and technologies are being used
- and I promise you, I'm not going to be the last to know!