Predictive Analytics Innovation Summit, San Diego: Day 1 Highlights

Highlights from the presentations by Predictive Analytics leaders from The Data Incubator, Tamr, Sony and Facebook on day 1 of Predictive Analytics Innovation Summit 2015 in San Diego.

IE AnalyticsPredictive Analytics Innovation Summit was held by Innovation Enterprise in San Diego on Feb 12-13, 2015. It provided a platform for leading executives to share interesting insights into the innovations that are driving success in the world's most successful organizations. Data scientists as well as decision makers from a number of companies came together to learn practical predictive analytics, data science and business intelligence from top companies like Google, Sony, Walmart, Facebook, Twitter, etc. Industry leading experts shared case studies and examples to illustrate how they are using and improving predictive models to innovate in their organization.

Here are highlights from day 1:

Michael Li, Founder, The Data Incubator delivered a talk titled "Needle in a Haystack - Hiring and retaining a big data analytics team". He shared his experience of hiring the right people, selecting key enabling technologies, building up business processes, and common-challenges and pitfalls. The Data Incubator is a six-week fellowship preparing PhDs for job profiles such as Data Scientist and Quants. Acceptance rate is very low (around 5%). There has been an immense growth in analytics and data science jobs.

Data Scientist should have these three skills: data analysis, data engineering and machine learning. He mentioned that organizations should try to be data-driven in their quest for data scientists and shared his study on this topic. Even though about half of applicants claim that they know Python, only one-third are really able to write code and about quarter get to correct answer. People who use Python do really well as compared to people who mention R and Matlab in their resume. University is not a good predictor to select candidates. On how to go behind retaining data scientists, he said the bosses should be open to ideas and trust data.
Prof. Ihab Ilyas, Co-Founder, Tamr started with mentioning that trillions of dollars have been spent in IT systems to automate and optimize key business processes. Now, with billions being invested in Big Data storage and access, and next generation analytics platforms, companies are beginning the analytic prosecution of the data stored in these centralized systems. Main technical challenges in getting tons of data pumped at you regularly are: Schema Mapping and Data duplication redundancy & incompleteness. A global equipment manufacturer with thousands of products across hundreds of databases from multiple suppliers wants to effortlessly identify the same part numbers across the supply chain. Also, Thomson Reuters took about 6 months on a single deduplication project of a subset of their data sources. Tamr provides the functionality of entity resolution. He shared some realities of data curation efforts as follow:
  1. Data is owned by people and is not an orphan. Therefore, fully automated cleaning will probably never be adopted in an enterprise setting.
  2. Scale renders most solutions un-deployable. As a result, we need to rethink all cleaning algorithms including record linkage to work at scale and avoid quadratic complexity. De-duping one million records natively can take weeks (even on a big machine).
  3. Data variety is worse. Thus, curation requires its own stack including transformations and adaptors.
  4. Iterative by nature, not by design. So, the solution needs to be incremental, agile and have low startup overhead. A curation solution should be a part of the data production line.

Tamr provides object linkage model, machine learning approaches for handling scale and open-channel with humans in different capacities.
Eric Daly, VP, Global Reporting & Analytics, Sony Pictures Entertainment talked about using data to drive business decisions in the home entertainment industry. He gave the following points proving the need of data:
  1. Traumatized Economy - Showing a pictorial view of Global GDP growth over time by IMF World Economic Outlook he mentioned that advanced economies which account for over 95% of the revenue were hardest hit by the recession and will see only modest growth moving forward.
  2. Proliferation of disruptive business models such as Blockbuster, RedBox, Netflix, Amazon Instant Video, etc. has driven simultaneous transaction growth and margin decline.
  3. Erosion of 2 key Global Planning drivers: new release "conversation rates" and consumers' purchase of "catalogue" titles.

As a result of above three, consumer retail spending on a given title is growing more front-loaded, thereby, giving just one chance to get it right and this chance is exacerbated by market contraction. Over the time, the models have changed drastically as previous models failed to forecast a quarter of profit opportunity.

Mario Vinasco, Data Scientist, Maketing Data Science & Analytics, Facebook gave a talk titled "Marketing Analytics & other uses in creating smart work places". He mentioned that to him, analytics is the art of counting. In marketing analytics, typically they drill down on specific areas, perform attribution analysis, A/B test and run predictive models to identify opportunities. The ecosystem at Facebook is really rich. A/B testing is the best way to optimize the customer experience, select acquisition channels and allocate investments; however, customer segmentation, identification of risk and potential and close monitoring of testing performance are essential components of the strategy. He shared insights from "lookback" feature which Facebook had launched few months ago.

Highlights from day 2