Key Takeaways from Open Data Science Conference (ODSC) West 2017
This year, the ODSC West was held at the Hyatt Regency San Francisco Airport, from November 2 to 4. I am, attempting here, to give you a snapshot tour of what I experienced.
Day 2: Friday, November 3, 2017
Keynote: Watson Data Platform
- Developing and deploying Machine Learning and data processing pipelines at scale is possible only if Data Science is managed and played as a team sport
- Data stewards, Data engineers, Data scientists, the Business Analysts and the App Developer need to work together
- On the IBM Watson data platform, DSx Notebooks and DSx Flow UI allows the learning and development of Machine Learning models.
- He shared how the Enterprise Machine Learning Workflow would look like on the Watson platform:
Keynote: The People’s Data
DJ Patil, Former US Chief Data Scientist
- Shorn of technical jargon and details, his talk held the packed hall spell-bound by highlighting the people behind the data – the impact of data analytics on human beings.
- He started off by saying that the mission of the US Chief Data Scientist is to responsibly unleash the power of data to benefit all Americans
- Starting with the Cook County Jail, he highlighted three strong human stories of how the right data, the right data analysis can bring about profound changes in human lives. This is a must-watch video that ODSC organizers will probably provide on their YouTube channel for free.
- He left the audience with three core principles:
- People are way more important than Data
- Data is a force multiplier
- The time to engage is now
Note: DJ Patil is joining Venrock as an Advisor to the Firm.
Keynote: The Deontology of Data Science
Igor Perisic, Chief Data Officer and VP of Engineering, LinkedIn
- Origins of Data Science; origins of data (1640 thereabouts) and origins of science (1400 AD) through various milestones to today.
- Models and applications of Data Science could range from small inconsequential things in society or extremely expensive (e.g. this headline in the New York Times, where a person was sent to prison by mistake, done by software).
- Right now, we are missing 3 things:
- Ethics of Data: Privacy, collection, sharing and use of data
- Ethics of Algorithms: Safeguards, Biases, Transparency, Accountability, etc.
- Ethics of the Practice: Professional Code, Responsible Innovation, Practitioner Accountability
- So, what we need is an approach to Ethics that focuses on the rightness or wrongness of the actions themselves, rather than by judging the consequences of those actions – a Deontology of Data Science. A very compelling argument!
The ODSC Award for 2017 was given to Pandas, the open source library. Wes McKinney, creator of Pandas, received the award on behalf of the contributors.
Meet the Experts, allowed an opportunity to meet Andreas Mueller, Core Contributor of scikit-learn. The second session was hosted by Igor Perisic on The Ethics of Data Science.
In the afternoon, the first session of Meet the Experts allowed access to Avneesh Saluja, ML Scientist at AirBnB to ask questions about Deep Learning. In the second session, Wes McKinney, the creator of Python’s Pandas Library was available for an interactive time.
Hyperparameter Tuning in Cloud Machine Learning Engine using Bayesian Optimization
Puneith Kaul, Senior Developer Programs Engineer on Cloud ML at Google
- Covered: Google Cloud ML Engine, Hypertune and how Google does Hyperparameter Tuning, Bayesian Optimization, Exploration-Exploitation, followed by a Hypertune exercise.
- Hyperparameters are the parameters not learned directly from the training process. Examples include the learning rate, the loss function, the mini-batch size and the number of training iterations. An optimal set of hyperparameters depends mostly on the model and the dataset.
- Bayesian Optimization uses a Gaussian Process to model the surrogate function which represents the black box – thus, very useful when the mathematical function is unknown or compute-expensive. A well-known implementation of Bayesian Optimization is Spearmint. Cloud ML Engine provides an out-of-the-box support for hyperparameter tuning using a simple YAML configuration.
- Trivia: Gaussian Process was used to find gold.
Day 3: Saturday, November 4, 2017
Data Science for Blockchain and Cryptocurrencies Applications
Bhairav Mehta, Senior Data Scientist, Apple
- Started with the legend of Satoshi Nakamoto and some of the recent headlines around Bitcoin.
- He walked through what is blockchain (technology that enables the peer-to-peer transmission of electronic value, without any intermediary) and how it works. I suggest going through this list that Chris Dixon has put together.
- Why isn’t blockchain ready for business? There are two options for building private networks for businesses: (a) Reconfigure a public network fabric for private use, or, (b) Build on top of an untested private network that is available. In either case, there aren’t enough critical features that enterprises need.
- You can watch Bhairav Mehta’s talk on the same topic at the ACM recently here.
- The following was an interesting slide from Bhairav’s deck:
Tutorial on Adversarial Machine Learning with CleverHans
Nicholas Carlini and Nicolas Papernot
By Karl Krall - Karl Krall, Denkende Tiere, Leipzig 1912, Tafel 2, Public Domain, https://commons.wikimedia.org/w/index.php?curid=240187
- Fascinating session that drew from Goodfellow et al. paper “Explaining and Harnessing Adversarial Examples” and based on CleverHans, the open source library for benchmarking vulnerability to adversarial examples.
- During training, the classifier uses a loss function to minimize model prediction errors. After training, the attacker uses loss function to maximize model prediction error
- Adversarial examples are not limited to Neural Nets and are a tangible instance of hypothetical AI safety problems
- CleverHans is named after the “Orlov Trotter horse that was claimed to have been able to perform arithmetic and other intellectual tasks.” Investigations revealed that it was merely responding to the involuntary signals and cues of its human trainer who could solve them.
- Representation Learning is “learning representations of the data that make it easier to extract useful information when building classifiers or other predictors.” – in the words of Y. Bengio et al. It’s become a field of its own in Machine Learning now.
- Why are deep Neural Networks so effective? Because of their representation capabilities. Nathaniel walked us through some illustrations of how a Neural Network represents data.
- This became even more engrossing when he walked us through word embeddings
The Networking Reception and Dinner with Data Scientists gave a chance to socialize, mingle and get to know the other attendees in an informal setting.
On this last day, the Career Fair was open to anyone for a tiny fee. So, aspirants who did not attend the Conference could also get a chance to push their resumes to hiring companies.
To conclude, the ODSC has become a must-go event in Data Science and Machine Learning, as it continuously strives to deliver a smooth event, while improving the content of its programs, quality of its speakers to deliver value to a diverse set of attendees. Their next event is the ODSC-East, in Boston from May 1-4, 2018. Look forward to it!
Bio: Jitendra Mudhol and his team members at CollaMeta are passionate about designing and developing Machine Learning applications in Manufacturing and Utilities. You may reach him at jsmudhol at collameta dot com.