KDnuggets Home » News » 2016 » Apr » News, Features » Best Data Science, Machine Learning Blogs from Companies and Startups ( 16:n15 )

Best Data Science, Machine Learning Blogs from Companies and Startups


A collection of company data science blogs to follow and read. Top blogs have links to, and excerpts from, recent quality posts of particular interest.



The following is a select list of data science related blogs from a number of top companies and startups doing data science.

The list is clearly not exhaustive; in fact, it is selective. It has been compiled from a number of sources, scatter-brained browsing, and personal preferences. While there are an awful lot of data science, big data, analytics, and machine learning blogs being run by companies and startups, this is a selection of some of the (subjectively) better ones.

I'm sure there are many fine blogs which have been left out, however, so don't take any particular exclusion as anything more than an oversight.

Blog banner

Airbnb

Airbnb shares data-related posts, code, tech talks, and more on their Nerds engineering blog.

Select post: Building for Trust: Insights from our efforts to distill the fuel for the sharing economy

Designing for trust is a well understood topic across the hospitality industry, but our efforts to democratize hospitality mean we have to rely on trust in an even more dramatic way. Not long ago our friends and families thought we were crazy for believing that someone would let a complete stranger stay in their home. That feeling stemmed from the fact that most of us were raised to fear strangers.

Big Cloud

Big Cloud's What We Think blog shares posts on big data, data science, and more.

Select post: The Evolution of the Data Scientist

Evolution might be considered to be an unusual word to describe the advancement of the Data Scientist. After all, evolution is defined as: “The way in which living things change and develop over millions of years”. I’m certainly not claiming that the Homo erectus could code. However, what we can clearly see, is that there is evolution in the methods, process and technology used by a Data Scientist.

Bolt.io

Bolt.io is a team of analysts and engineers from Optimizely and Google who work with distributed systems, data warehouses and analytics. These are their stories.

Select post: 7 Simple Rules to Ensure Data Quality in Your Data Warehouse

When importing data into your data warehouse, you will almost certainly encounter data quality errors at many steps of the ETL pipeline.

Databricks

The makers of Spark use their blog to convey all sorts of Databricks and Spark related information, tutorials, talks, and more.

Select post: Introducing our new eBook: Apache Spark Analytics Made Simple

This e-book, the first of a series, offers a collection of the most popular technical blog posts written by leading Spark contributors and members of the Spark PMC including Matei Zaharia, the creator of Spark.

DataCamp

DataCamp's blog aims to show people how to do data analysis like a pro. These posts are mostly on the "lighter" side, but still worth a look.

Select post: Create your own R tutorials with Github & DataCamp

This blog tutorial walks you through the different steps to create your own interactive online course on www.DataCamp.com. Next to the post you can also have a look at the complementary screencast below. If you want to take a deep dive into DataCamp's documentation, head over to www.datacamp.com/teach/documentation. Teaching Data Science has never been easier!

Data science wordcloud

Dataquest

The Dataquest Blog is musings about data science, from the makers of Dataquest.io

Select post: Getting started with data science in Python

In this post, we’ll walk through getting started with data science using Python. If you want to dive more deeply into the topics we cover, visit Dataquest, where we teach every component of the Python data science lifecycle in depth.

Domino

The Domino Blog boasts that it is "at the intersection of data science and engineering."

Select post: Building a High-Throughput Data Science Machine

Scaling is hard. Scaling data science is extra hard. What does it take to run a sophisticated data science organization? What are some of the things that need to be on your mind as you scale to a repeatable, high-throughput data science machine?

Facebook

Facebook Research's blog contains topics relevant to data science, engineering, and more.

Select post: What I learned from interning at Facebook as a PhD student

Building a system from scratch until it runs on thousands of servers and serves billions of users is highly rewarding, but also much more complicated than, say, creating a prototype in a research lab. So what are some important guidelines?

Google (Unofficial)

The Unofficial Google Data Science Blog is "the work of some data scientists at Google who wish to bring out stories of interest to data scientists outside of Google." Tends toward the technical and academic. Maybe not "official," but...

Select post: Variance and significance in large-scale online services

There are many reasons for the recent explosion of data and the resulting rise of data science. One big factor in putting data science on the map has been what we might call Large Scale Online Services (LSOS). These are sites and services which rely both on ubiquitous user access to the internet as well as advances in technology to scale to millions of simultaneous users.