-
Causation or Correlation: Explaining Hill Criteria using xkcd
This is an attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures.
-
Web Scraping for Dataset Curation, Part 2: Tidying Craft Beer Data
This is the second part in a 2 part series on curating data from the web. The first part focused on web scraping, while this post details the process of tidying scraped data after the fact.
-
Web Scraping for Dataset Curation, Part 1: Collecting Craft Beer Data
This post is the first in a 2 part series on scraping and cleaning data from the web using Python. This first part is concerned with the scraping aspect, while the second part while focus on the cleaning. A concrete example is presented.
-
The Data Science of NYC Taxi Trips: An Analysis & Visualization
This post outlines using Google BigQuery for an analysis of NYC Taxi Trips in the cloud, presenting the analysis and visualization in Tableau Public for readers to interact with.
-
Getting Real World Results From Agile Data Science Teams
In this post, I’ll look at the practical ingredients of managing agile data science. By using agile data science methods, we help data teams do fast and directed work, and manage the inherent uncertainty of data science and application development.
-
Automatically Segmenting Data With Clustering
In this post, we’ll walk through one such algorithm called K-Means Clustering, how to measure its efficacy, and how to choose the sets of segments you generate.
-
Making Python Speak SQL with pandasql
Want to wrangle Pandas data like you would SQL using Python? This post serves as an introduction to pandasql, and details how to get it up and running inside of Rodeo.
-
Top R Packages for Machine Learning
What are the most popular ML packages? Let's look at a ranking based on package downloads and social website activity.
-
Is Deep Learning the Silver Bullet?
With nearly every every smart young computer scientist planning to work on deep learning, are there really still artificial intelligence researchers working on other techniques? Is deep learning the AI silver bullet?
-
Pandas Cheat Sheet: Data Science and Data Wrangling in Python
The Pandas library can seem very elaborate and it might be hard to find a single point of entry to the material: with other learning materials focusing on different aspects of this library, you can definitely use a reference sheet to help you get the hang of it.
|