SAP Predictive Analytics Interview with Sven Bauszus
I talk with Sven Bauszus, SAP Predictive Analytics global leader, about their main products in Business and Predictive Analytics and Big Data space, analytics maturity by industry, the automation of Data Science, and "citizen" Data Scientists.
GP, Q3: How do SAP Predictive Analytics - automated analytics perform relative to Data Scientists? How would it perform in Kaggle competitions or KDD Cups?
SB: Automated analytics brings huge productivity to data scientist build massive number of models and collaborate with their data science team. Moreover, the model can be managed, retrained and scored in the predictive factory for further collaboration. Automated analytics can be used in conjunction with expert analytics with variety of data targets that include both SAP and non-SAP data ecosystem by analysts and data scientists. In the past, KXEN participated in the Kaggle's Million Song Dataset (MSD) Challenge. More details can be found from Erik MARCADE interview in KDnuggets.
GP, Q4: How would you guard against bad analysis by "citizen" Data Scientists?
SB: Data quality and ethics are the top contributors to "bad analysis". Having said that, as per our CTO, Erik Marcade, a model quality measures that prevent "bad analysis" by "citizen" data scientists should be compliant with some basic requirements, independent of the modeling task:
GP, Q5: Based on your experience, how would you rank analytics maturity of different industries? Which are the leaders and which ones are laggards?
SB: We see Telco, Retail, CPG, Healthcare and Hi-tech clearly leading and followed by Banking, Oil & Gas, Manufacturing, Aerospace & Defense.
GP, Q6: What do you see as main the obstacles to Analytics Adoption in different industries? What about obstacles to Automation of Analytics / Data Science?
SB: Biggest concern is lack of in-house subject matter expertise, not seeing the forest for the trees in the sense of what use case is really important to my business and can be implemented on top of existing data and data collection process. How to get started and how to implement the modeling process is another big concerns.
Most of the time there is a strong vision but no quantifiable business case to justify the investment in technology, time, people and infrastructure for deploying predictive use cases on top of an existing Business analytics landscape.
Everything starts with use case - finding, defining, organizing and collecting the right data.
GP, Q7: Should Data Scientists be concerned that they will be replaced by automated tools like SAP Automated Analytics, and when?
SB: Not at all. The purpose automated analytics bring the data science team efficiency with collaboration and automation. Data scientists are important part of the equation data science team. Many of our customers understand automated analytics is not replacement for data scientist, it is to improve the productivity and extend the data scientist capabilities across the organization.
SB: Automated analytics brings huge productivity to data scientist build massive number of models and collaborate with their data science team. Moreover, the model can be managed, retrained and scored in the predictive factory for further collaboration. Automated analytics can be used in conjunction with expert analytics with variety of data targets that include both SAP and non-SAP data ecosystem by analysts and data scientists. In the past, KXEN participated in the Kaggle's Million Song Dataset (MSD) Challenge. More details can be found from Erik MARCADE interview in KDnuggets.
GP, Q4: How would you guard against bad analysis by "citizen" Data Scientists?
SB: Data quality and ethics are the top contributors to "bad analysis". Having said that, as per our CTO, Erik Marcade, a model quality measures that prevent "bad analysis" by "citizen" data scientists should be compliant with some basic requirements, independent of the modeling task:
- The model quality should be expressed as a value between 0 and 1, or as a percent.
- The model quality of a perfect system should be 100%.
- The model quality of a random (or constant) system should be 0%.
- The model quality should be computed without any assumption about the underlying algorithm or the target distribution.
- The model quality should relate to an intuitive
GP, Q5: Based on your experience, how would you rank analytics maturity of different industries? Which are the leaders and which ones are laggards?
SB: We see Telco, Retail, CPG, Healthcare and Hi-tech clearly leading and followed by Banking, Oil & Gas, Manufacturing, Aerospace & Defense.
GP, Q6: What do you see as main the obstacles to Analytics Adoption in different industries? What about obstacles to Automation of Analytics / Data Science?
SB: Biggest concern is lack of in-house subject matter expertise, not seeing the forest for the trees in the sense of what use case is really important to my business and can be implemented on top of existing data and data collection process. How to get started and how to implement the modeling process is another big concerns.
Most of the time there is a strong vision but no quantifiable business case to justify the investment in technology, time, people and infrastructure for deploying predictive use cases on top of an existing Business analytics landscape.
Everything starts with use case - finding, defining, organizing and collecting the right data.
GP, Q7: Should Data Scientists be concerned that they will be replaced by automated tools like SAP Automated Analytics, and when?
SB: Not at all. The purpose automated analytics bring the data science team efficiency with collaboration and automation. Data scientists are important part of the equation data science team. Many of our customers understand automated analytics is not replacement for data scientist, it is to improve the productivity and extend the data scientist capabilities across the organization.