Why You Should Attend the Data Science Summit 2016 and 9 Talks To Be Excited About

Here is a preview of the Data Science Summit, July 12-13 in San Francisco, where you can meet quality people hear exciting talks like 9 described here. Get get 20% with the code SFDATASCIENCE.

By Jared Polivka, Galvanize.

In 2015, Turi (formerly called Dato) made a splash at their annual Data Science Summit by taking SFrame and SGraph open source. My colleague, Bo Moore, wrote about the big announcement in a blog post called ‘Eight Tools for the Python Data Ecosystem.’ With the conference less than a week away; I’m excited to discover what Turi will unveil this year.

The Data Science Summit isn’t my first rodeo. Over past year, I’ve personally attended over a dozen data science conferences (I write about most them here: Data Science Conferences: One List to Rule Them All… I write ‘most’ because my list is in need of an update) or sent Galvanize data science students to fill in for me at the conferences that I wasn’t able to attend.

As a conference veteran, here are the not so secret ingredients that make Dato’s Data Science Summit a special, must attend, conference:

  • Quality People:
    First, the speakers… they are world class. The Turi team hand picks speakers who are practitioners and thought leaders from industry (you’re going to learn a ton).

    Second, the attendees… they are technical, typically consisting of analysts, data scientists and data engineers (you’ll make friends or find someone to hire… if you’re hiring).

  • The Content Mix:
    At the Data Science Summit, you’ll encounter speakers who work on the tools that we use as data scientists (example: Andreas Mueller) and you’ll hear talks from professors who are doing cutting edge research (example: Emily Fox). At this data science conference, you will attain practical knowledge from seasoned practitioners that you can apply and you’ll get insight into the future of our industry.

Psst, I have two special discounts for you dear reader. Tell… everyone. Yes, tell everyone.
The first discount: get 20% off the Data Science Summit with the code “SFDATASCIENCE”. Register here.
The second discount: get a combined 25% off the Data Science Summit and Open Data Science Summit West.

Ok, now you know what the Data Science Summit is, why it’s awesome and you have a discount code (you’re welcome). Without any further ado, here are the nine talks that I’m excited to attend at Data Science Summit 2016:

Talk 1: The Five Tribes of Machine Learning, and What You Can Take from Each

By Pedro Domingos, Professor of Computer Science at the University of Washington. Author of “The Master Algorithm

About This Talk:

There are five main schools of thought in machine learning, and each has its own “master algorithm” (a general-purpose learner that can in principle be applied to any domain). Here are the 5 schools and their algorithms:

  1. Symbolists - they have inverse deduction
  2. Connectionists - they have backpropagation
  3. Evolutionaries - have genetic programming
  4. Bayesians - they have probabilistic inference
  5. Analogizers - they have support vector machines.

What we really need, is a single algorithm combining the key features of all of the above (a true universal “Master Algorithm”).

You Will Learn:
- About Pedro Domingos’ work toward the goal of creating a master algorithm (including Markov logic networks)

- The new applications that a universal learner (i.e. master algorithm) will enable

- How society will change as a result of a universal learner

Meet Your Speaker:

Computer science professor, author, winner of the SIGKDD Innovation Award (the highest honor in data science), Fellow of the Association for the Advancement of Artificial Intelligence, co-founder of the International Machine Learning Society… trust me, you don’t want to miss a talk by Pedro Domingos.

Pedro’s research spans a wide variety of topics in machine learning, artificial intelligence, and data science, including scaling learning algorithms to big data, maximizing word of mouth in social networks, unifying logic and probability, and deep learning. Connect with Pedro

Talk 2: Exploratory Data Analysis 2.0

By Jock Mackinlay, VP of Research & Design at Tableau, Information Visualization Expert

About This Talk:

In his famous 1977 book Exploratory Data Analysis (EDA) John Tukey demonstrated to the statistics community the value of using visual methods to suggest hypotheses and to validate statistical models.  Forty years later, we can use computer graphics and interactive visual representations of data to implement EDA 2.0.

You Will Learn:

How to use modern technology to maintain cognitive flow when working with statistical and machine-learning algorithms.

Meet Your Speaker:
At Stanford University, Jock Mackinlay pioneered the automatic design of graphical presentations of relational information. The fruits of Jock’s research are in his book, “Readings in Information Visualization: Using Vision to Think.” Jock has a Ph.D. in Computer Science from Stanford University. Connect with Jock

Talk 3: Product Reviews and NLP: Analysis and Elasticsearch

By Lynn Cherny, Data Analysis Consultant

About This Talk:

This talk will be more hands on. Lynn Cherny will walk you through the exploratory analysis of a product review corpus using Python tools, primarily focused on natural language processing (NLP).

You Will Learn:
After some data exploration and crunching, you will learn how you might use the results in an Elasticsearch-based search engine and hook it up to a web page.  (The focus in Elasticsearch will be on using NLP features, not deployment at scale)

Meet the Speaker

Lynn is a data analysis consultant; she specialties in NLP/text data and interactive data visualization.  With a Ph.D. in Linguistics from Stanford, Lynn spent 18 years as a UX designer, manager, and researcher at top companies (AT&T, Autodesk, Adobe, and TiVo).  For the past year, Lynn was a visiting Knight Chair in Journalism at University of Miami, where she taught journalists how to build interactive data visualizations with d3.js.  Lynn moderates the data-vis-jobs list on googlegroups and is co-Chair of the annual OpenVis Conference in Boston, focused on web visualization tools and methods. Connect with Lynn

Talk 4: Scalable Bayesian Models of Interacting Time Series

By Emily Fox, Amazon Professor of Machine Learning in the Statistics Department at the University of Washington


About this Talk:

Modeling the intricate and possibly evolving relationships between large collection of series can lead to increased predictive performance and domain-interpretable structures.  For scalability, it is crucial to discover and exploit sparse dependencies between the data streams.


In this talk, we will cover a series of Bayesian models for capturing such sparse dependencies via clustering, graphical models, and low-dimensional embeddings of time series. We explore these methods in a variety of applications, including house price modeling and inferring networks in the brain.

Meet Your Instructor:

Emily Fox received a S.B. in 2004 and Ph.D. in 2009 from the Department of Electrical Engineering and Computer Science at MIT. Her research interests are in large-scale Bayesian dynamic modeling and computations.

She has received numerous awards:

  • Sloan Research Fellowship (2015)

  • ONR Young Investigator award (2015)

  • NSF CAREER award (2014)

  • Leonard J. Savage Thesis Award in Applied Methodology (2009)

  • MIT EECS Jin-Au Kong Outstanding Doctoral Thesis Prize (2009).

Talk 5: The Exploit-Explore Dilemma in Music Recommendation

By Òscar Celma, Director of Research at Pandora

About This Talk:

Pandora radio is best known for the Music Genome Project; a unique and richly labeled music catalog of 1.5 million+ tracks. Pandora has also collected more than a decade of contextual listener feedback in the form of more than 65 billion thumbs from 79M+ monthly active users who have created more than 8 billion stations.


This talk will show how the interdisciplinary team at Pandora goes about making sense of these massive data sets to successfully make large scale music recommendations.


Meet the Speaker:

Dr. Òscar Celma leads a team of 60 scientists and musicologists at Pandora to provide the best personalized online radio experience.


Òscar published a book named “Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space” (Springer, 2010). He holds a few patents from his work on music recommendation and discovery as well as on Vocaloid, a singing voice synthesizer bought by Yamaha in 2004. Connect with Oscar

Talk 6: Engineering Open Machine Learning Software

By Andreas Mueller, Research Engineer at Center for Data Science (NYU)


About This Talk:

Broadly, this talk will discuss the challenges and trade-offs of creating an open source machine learning library.

Specifically, Scikit-learn has become a popular machine learning library, used across applications in academia and industry. This talk will discuss some design decisions that lead to Scikit-learn’s success while highlighting continuing challenges.


You Will Learn:
- The trade-offs of following current trends and developments in machine learning

- How demand is growing demand for easy-to-use data science solutions

- The Maintainability and stability of Scikit-learn


Meet Your Speaker:

Andreas Mueller is a core developer of scikit-learn (on the project for over 5 years). Andreas is passionate about democratizing access to hiqh quality machine learning algorithms.

Talk 7: Advancing the Python Data Stack with Apache Arrow

By Wes McKinney, Lead Software Engineer at Cloudera

About This Talk:

This talk is about Apache Arrow and how it will enable Python developers to work better with big data systems (historically... Python stack + big data problems = headache).

You will Learn:
- How to Apache Arrow will enable Python programmers to work on big data problems in a more natural and performant way
- How to get started using Apache Arrow
- The state of new tools being created to help Python work better with Spark and Hadoop.


Meet Your Speaker:
Wes McKinney is best known for creating the pandas project and for writing the book Python for Data Analysis. I've seen him speak at a myriad of conferences... he consistently delivers high quality talks that are often practical (sometimes including hands on exercises).

Talk 8: Synthesizing Human and Machine Capabilities

By Eric Colson, Chief Algorithms Officer at Stitch Fix

About This Talk:

Machine learning and artificial intelligence have made tremendous advances in the last several years. Machines are now better than humans at many tasks – facial recognition, medical diagnosis, parole decisions, driving cars – and the list is rapidly growing. Skills like the ability to empathize, the ability to leverage ambient information, and the ability to grasp broad context are tremendously valuable and unique to humans.

With connected devices it is now possible to harness the unique abilities of expert-humans and infuse them into services. New software systems are emerging that distribute their workloads across varied processors – be they machine or human. This synthesis enables new capabilities that go far beyond what is possible using either one resource alone.


Meet Your Speaker:

Eric Colson specializes in social algorithms at Stitch Fix. He is an advisor at Earnest (consumer lending), Data Elite (Big Data incubator), and Mortar Data (Big Data Platform). Previously, he was VP of Data Science. He holds a B.A. in Economics (SFSU), a M.S. in Information Systems (GGU), and a M.S. in Management Science & Engineering (Stanford). Connect with Eric

Talk 9: Machine Learning at Pinterest

By Jure Leskovec, Chief Scientist at Pinterest and Assistant Professor in Computer Science at Stanford University

About This Talk:

Machine learning is at the core of Pinterest. Pinterest personalizes and ranks 1B+ pins, 700+ million boards for 100M+ users all over the world, using data gathered from collaborative filtering, user curation, web crawling, and more. At Pinterest we model relationships between pins, handle cold-start problems and deal with real-time recommendations. In this talk Jure will give an overview of the problems and effective solutions developed at Pinterest. We will focus on systems and effective engineering choices made to enable productive machine learning development and enable multiple engineers effectively develop, test, and deploy machine-learned models.


Meet Your Speaker:

Professor Jure Leskovec is a member of the InfoLab and the AI lab at Stanford. Jure completed his Ph.D. in Machine Learning Department, School of Computer Science at Carnegie Mellon University under the supervision of Christos Faloutsos in 2008. In addition to his work at Pinterest and Stanford, Jure works with the Artificial Intelligence Laboratory, Jozef Stefan Institute, Ljubljana, Slovenia. Connect with Jure

Reader Bonus

Wow… you’re still here. You read all 9 talks. I’m impressed. To reward your stamina, here’s a chance to win a ticket to the Data Science Summit. Enter the summer data science giveaway for a chance to win:

a Rafflecopter giveaway


“You're still here? It's Over. Go Home. Go.” - Ferris Bueller

About the Author:

Jared Polivka serves as Director of Product Marketing and as Chief Evangelist (Data Science) at Galvanize - the nation’s top data science bootcamp. Jared co-organizes the SF Data Science Meetup and volunteers on the organizing/advisory committees for PyData SF, the Data Engineering Conference and the Kaizen Data Science Conference. When he’s not working on product, marketing or building community, he’s usually hiking or playing a game of chess while sipping tea. If you have a good story to share (he loves stories), reach out on Twitter or LinkedIn.