Weapons of Math Destruction, Ethical Matrix, Nate Silver and more Highlights from the Data Science Leaders Summit

Domino Data Lab hosted its first ever Data Science Leaders Summit at the lovely Yerba Buena Center for the Arts in San Francisco on May 30-31, 2018.  Cathy O'Neil, Nate Silver, Cassie Kozyrkov and Eric Colson were some of the speakers at this event.

Keynote: Nate Silver

Nate Silver, the founder and editor in chief of FiveThirtyEight and the author of The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t, began his talk about the hype around data and data science.  Society has not gotten better at prediction.  Medical, psychology and economics journals suffer from a replication crisis.  All these signify new sets of problems.

What are the types of problems when you have more data?


Problem #1: More Room for Interpretation


Last year’s US election prediction was a complex problem.  Even on election day, November 8, Hillary Clinton was ahead in the popular vote.  There were multiple ways of interpreting the data and trading off.  We are still in the early days of data-driven prediction.


Problem #2: The Signal-to-Noise Ratio gets worse


The number of possible combinations that need to be tested is usually too large.  There is a very large possibility of finding false positives.  We were not thinking of such problems even three years ago when we had less data.


Problem #3: Feature or Bug?


Competitiveness drives us to over-optimize software but it may miss out the big picture.  Nate gave the example of a taxi driver's GPS turn-by-turn directions that tried to route him via a shortcut that was available only at specific times; the end-effect was unnecessary excess mileage that defied common-sense.  When does one trust the algorithm versus human instinct?


To solve real-world problems, most people tend to use their gut to get to about 80 percent of the way close to the solution and then use data science to inform them for the last 20 percent or so.  Nate suggests that this should be reversed.


Nate suggested three best mental practices to help one navigate the world.


Suggestion#1: Think Probabilistically


Nate gave the example of Hurricane Irma – the 3-day advance prediction average error was a 350-mile radius earlier, but today we can get this down to about 100-mile radius.  Why has weather forecasting improved so much?  It’s a combination of 3 things: (a) We understand the chemistry and physics of atmospheric sciences better (b). The computing abilities have improved dramatically (c). Weather forecasters are getting good practice each day – they predict for about 420 cities around the USA multiple times each day.  It helps to think through probabilistically when making predictions.


Suggestion #2: Know Where You’re Coming From


It is good to know where you are coming from – your biases or prejudices.  This helps us to understand the objectivity behind one’s analysis.  Quoting James Surowiecki’s book The Wisdom of Crowds, he summarized that group wisdom fails or succeeds when:

  • The diversity of thought and diversity of skillsets in the group are preserved
  • Independence – the freedom to articulate, express views, debate, question anything and everything is vital. For example, in the book Bad Blood About Theranos – nobody could question Elizabeth at Theranos
  • Decentralized processes – where you can collect data independently – it brings more information to the table that is rich enough – it gives the ability to dig in and examine it – it ensures that you are not missing the vital parts of the picture


Suggestion #3: Try, and Err


Nate then shared that he loves playing poker.  He shared a Poker story – how Moneymaker entered a poker tournament for the lark but ended up winning it.  Nate was interested in the data behind poker.  Based on his understanding: It’s easy to be really bad at something – the worst player is actually making things worse by just trying to play.  He shared the learning curve for poker where the 80-20 rule applies in an interesting way: To get to about 80 percent accuracy, your rate of learning is going to be steep and fairly fast.  But, the rate of learning in the last 20 percent slows exponentially.


Nate also shared interesting aftermath of Billy Beane’s story with the Oakland A’s – Billy had picked the low hanging fruit and achieved initial success, but after a year or two, the other competitors caught up well and fast.  The Silicon Valley giants are experts are making a lot of incremental progress and that’s truly where their competitive advantage is.  They iterate fast in the last 20 percent of complex problems, after having mastered the easy parts.


Keynote: Nick Elprin at Domino Data Lab

Nick Elprin, the Founder, and CEO of Domino Data Lab welcomed everyone to the Conference.

As organizations embark on their AI / Machine Learning journey, there are some real obstacles to solve.  Here are some example quotes from team members in such organizations:



Story: Where there were a set of innovative new capabilities and yet technical creative people felt stuck – history of software development – the waterfall model and today everyone looks to agile as “the method” – waterfall method is looked at with disdain but …


The waterfall method worked great for hardware developments – the first era of computing.  Software developers got stuck and could not make it work.  The agile movement recognized that software was fundamentally different from hardware.  Its nature is different, even after it is shipped.  This realization unlocked a revolution.


Today we are in the next early era of computing and we must recognize the difference between software development and data science.  At the heart of data science lies an innocuous sounding thing … called a model.   A model is a special type of algorithm, whose instructions are induced from a set of data and used to make predictions, recommendations or more generally to prescribe some action based on a probabilistic assessment.


If data is like oil (static), then models are like engines that consume this oil.  But data is dynamic and flowing.  That makes models more dynamic.  Once we viewed models as the most important output of data science, but that may be a little too simple.


In 2006, Netflix offered a million dollars to anyone who could improve its recommender model.  That was a bargain, as today, Netflix said its recommender models are worth over a billion dollars for its business.  Recently, Netflix surpassed Disney in market cap.


Coca-Cola uses a model called Black Book to work through 600 different parameters to tune the recipe to manufacture orange juice.  Insurance companies are working on tuning their models to improve the customer experiences and efficiencies.


Model-driven businesses drive advantage in two ways:

  1. Breakthroughs that create new products, killer features, and even new revenue streams. g. Autopilots in cars is a great example of this.
  2. Operational efficiencies that compound through constant incremental improvement. This is concealed and not easily visible to outsiders.  Jeff Bezos’s 2016 letter to shareholders.


A McKinsey research study shows that only 20 percent of companies are using models and those who are, have profit margins 10 percent higher than those who aren’t.



The Model Myth

There is a misconception that because models involve code and data, they should be treated the same way as software or data.


Models are different from software in 3 important ways:

  1. Materials: The data, the computationally intensive algorithms and specialized hardware, packages from vibrant ecosystems are different. So, there needs to be an agile approach.
  2. Process: Need an iterative, dynamic process that includes scope for research-based workflows, for experimental and emergent phases that encourage exploration and discovery.
  3. Behavior: Models are probabilistic, without providing concrete answers. Models need constant monitoring and re-training, unlike a software package that is released.


Nick then introduced Domino’s idea of Model Management: Models require new organizational capability.  Click here to download a copy of the Domino white paper describing its Model Management.  Domino describes this as a category of technologies and processes that allow companies to consistently and safely build, validate, deliver and monitor models that create competitive advantage.  It has five parts:

  1. Model Development
  2. Model Production – how to operationalize
  3. Model Technology – compute needs and tooling needs
  4. Model Governance – keeping track of all models and to govern them
  5. Model Context – set of artifacts knowledge and insights that the organization accumulates expertise as it designs and delivers models.


Cassie Kozyrkov, Google




Cassie posited that an organization’s culture evolves through various stages to become truly data-driven.  With this as her framework, she wove an absorbing tapestry of tips and suggestions on how to evolve that elusive culture part.  Ideally, as the story progresses, the tips go with the stages, but for that, the best would be to attend or view the recording.  I have tried to summarize her list into the tables below, admittedly doing no justice to the richness of her talk.



Cassie challenges you to tackle the typical problems organizations face as they embark on their data science journey, all the while sharing critical inputs.



Once an organization decides to use data and moves to Stage 2, its hiring woes begin.  Cassie suggests the following roles for the players in data science and to decide appropriately when to hire whom.



Via the following ten tips, Cassie shared ways to deal with expected roadblocks in progressing from one stage to the next.



Training the decision-makers is vital in preparing them to tackle the big conclusions.  In Google, this is called Decision Intelligence Engineering and the essence is to avoid making Type 3 error (from Statistics): Correctly rejecting the wrong null hypothesis – correctly using all the math to answer the wrong question.


Cassie summed up her talk with the following advice.

Design your decision process in this order:

  1. Decision-making under no new information:
    • Pick a default action
  2. Decision-making under full information:
    • Set decision criteria
  3. Decision-making under partial information:
    • Set statistical requirements

Too difficult?  Then, stick to descriptive analytics.


Eric Colson @StitchFix


Differentiation by Data Science


Eric Colson is the Chief Algorithms Officer at StitchFix and the differentiation focus of his talk was on businesses to achieve competitive advantage.  Nature is a great inspiration for differentiation – speed, size, color, special skills, etc.  Through differentiation, Saks Fifth Avenue caters to the upper end of the department stores’ market, Macy’s to the mid-range and K-Mart to the lower-end.  Amazon relies on speed.  Louis Vuitton differentiates by brand.

A relatively new way to differentiate in business is by using Data Science.

StitchFix eliminates shopping for clothes, thereby letting the customers tell StitchFix more about themselves so that the algorithms pick out the clothes tailored (no pun intended) to each customer’s preferences.  And yet, it is not just a recommendation engine!  85 developers work on a wide variety of algorithms, barely 5 work on the recommendation system.  Check out the StitchFix algorithms' cool showcase here.



The role of Data Science is usually just supportive in most organizations, which limits it to helping in incremental improvements.  He illustrated this from the design of the laryngeal nerve in the neck of giraffes which prevents it from making any sound.  Eric called for moving the role of Data Science specifically to:

  1. Participate in the framing of problems
  2. Participate in the conception of ideas

Instead of asking a Data Scientist to merely optimize something, ask her/him in the framing of the problem or at the idea conception stage.  That could help Data Science leap past legacy constraints into more significant contributions.


Drawing from another inspiration from Nature, Eric explained how the bucket-orchid traps bees and gets them to spread its pollen in a completely new and untapped way.  Businesses can also tap into completely new ideas by harnessing Data Science in more effective ways.  For example: Using genetic algorithms to design completely new clothes by recombining existing styles.



Provide the right environment for Data Science to thrive – e.g. making Data Science as a top-level department, reporting to the CEO.  This ensures a C-level position that enables participation in strategy and better career opportunities that are not capped at the Director level.


Should an organization hire specialists or generalists?  If the situation and requirements are dynamic and changing, then it is better to hire generalists.  Data products are messy and benefit a lot from iteration and change and that means, it helps to have more generalists.  When your production hits stability, then it helps to hire specialists.


Hiring Data Science talent: In a typically non-creative field such as brick-laying, physical constraints mean the distribution of talent follows the typical bell-shaped Gaussian curve.  In such cases, the top talent is barely about 20% more talented than most.  However, for a creative field like Data Science, Eric strongly believes that the curve is shaped more like a Gamma distribution (with say, k=7.5 and Ø = 1.0) and so the best Data Scientist (hard to find, of course) could be 10x times better than the average.


Finally, Eric concluded his analogy of Nature and Businesses: Nature is constantly changing, evolving and adapting, and businesses also need to do the same.