- Creating Simple Data Visualizations as an Act of Kindness - Dec 12, 2017.
The field of data visualization is still quite young and evolving rapidly—and tools like the web and VR are continuing to expand the possibilities. So there is a lot of room for exploring new possibilities and creating new formats, as well as many examples of novel and amazing visualizations.
- Evaluating Data Science Projects: A Case Study Critique - Sep 19, 2017.
It’s not necessary to understand the inner workings of a machine learning project, but you should understand whether the right things have been measured and whether the results are suited to the business problem. You need to know whether to believe what data scientists are telling you.
- A Guide to Understanding AI Toolkits - Aug 16, 2017.
This post surveys today’s foremost options for AI in the form of deep learning, examining each toolkit’s primary advantages as well as their respective industry supporters.
- Mind Reading: Using Artificial Neural Nets to Predict Viewed Image Categories From EEG Readings - Aug 9, 2017.
This post outlines the approach taken at a recent deep learning hackathon, hosted by YCombinator-backed startup DeepGram. The dataset: EEG readings from a Stanford research project that predicted which category of images their test subjects were viewing using linear discriminant analysis.
- Exploratory Data Analysis in Python - Jul 7, 2017.
We view EDA very much like a tree: there is a basic series of steps you perform every time you perform EDA (the main trunk of the tree) but at each step, observations will lead you down other avenues (branches) of exploration by raising questions you want to answer or hypotheses you want to test.
- Simplifying Data Pipelines in Hadoop: Overcoming the Growing Pains - May 18, 2017.
Moving to Hadoop is not without its challenges—there are so many options, from tools to approaches, that can have a significant impact on the future success of a business’ strategy. Data management and data pipelining can be particularly difficult.
- Building, Training, and Improving on Existing Recurrent Neural Networks - May 8, 2017.
In this post, we’ll provide a short tutorial for training a RNN for speech recognition, including code snippets throughout.
- Models: From the Lab to the Factory - Apr 27, 2017.
In this post, we’ll go over techniques to avoid these scenarios through the process of model management and deployment.
- The Value of Exploratory Data Analysis - Apr 20, 2017.
In this post, we will give a high level overview of what exploratory data analysis (EDA) typically entails and then describe three of the major ways EDA is critical to successfully model and interpret its results.
- A Short Guide to Navigating the Jupyter Ecosystem - Mar 31, 2017.
This post presents a no-nonsense overview of the Jupyter ecosystem, and a few tips, tricks and concepts you may find useful for navigating it.
- Getting Started with Deep Learning - Mar 24, 2017.
This post approaches getting started with deep learning from a framework perspective. Gain a quick overview and comparison of available tools for implementing neural networks to help choose what's right for you.
- Open Source Toolkits for Speech Recognition - Mar 14, 2017.
This article reviews the main options for free speech recognition toolkits that use traditional Hidden Markov Models and n-gram language models.
- Getting Real World Results From Agile Data Science Teams - Feb 10, 2017.
In this post, I’ll look at the practical ingredients of managing agile data science. By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.
- Introduction to Trainspotting: Computer Vision, Caltrain, and Predictive Analytics - Nov 1, 2016.
We previously analyzed delays using Caltrain’s real-time API to improve arrival predictions, and we have modeled the sounds of passing trains to tell them apart. In this post we’ll start looking at the nuts and bolts of making our Caltrain work possible.
- Understanding the Chief Data Officer - Nov 1, 2016.
In this report you will find a concise look at how CDOs view their nascent role in high-profile organizations, focusing on guidelines and best practices for organizations looking to add their own CDO.
- Jupyter Notebook Best Practices for Data Science - Oct 20, 2016.
Check out this overview of Jupyter notebook best practices as pertains to data science. Novice or expert, you may find something of use here.
- Strata 2014 Santa Clara: Highlights of Day 2 (Feb 12) - Feb 27, 2014.
Strata 2014 was a great conference, and here are key insights from some of the best sessions on day 2: Big Data Vendor Landscape, Machine Learning for Social Change, Secrets of Gertrude Stein, and Facebook Exascale Analytics.