Best Data Science, Machine Learning Blogs from Companies and Startups

A collection of company data science blogs to follow and read. Top blogs have links to, and excerpts from, recent quality posts of particular interest.



Blog banner

HPE Haven OnDemand

The HPE Haven OnDemand Developer Community Blog is contributed to by the wider community using the HPE Haven OnDemand machine learning APIs.

Select post: Using Haven to Index and Search documents (with live example and code base)

In this article, we’ll cover how you can index files into Haven OnDemand’s and retrieve those documents using Haven OnDemand’s full text search API’s. Using the FindSimilar API, you can rank documents in an index by how closely they match your input text.

IBM

The IBM Big Data & Analytics Hub Blogs is an aggregator of relevant blogs from across the IBM big data and analytics spectrum, covering numerous topics of interest.

Select post: AI in government: Can computers really be good at decision making?

Are you skeptical about machines’ ability to effectively aid social science decision making? Machines are becoming ever more intelligent, increasingly able to help humans make decisions across the social science spectrum, but cognitive computing is still in its infancy, with much unexplored ground ahead.

Lab41

Gab41 is Lab41's blog, sharing experimental results and thoughts on their big data challenges. It will get technical.

Select post: Generative Methods are a Good Idea for Handwriting

It has been interesting to watch deep learning evolve over the past four years. Deep learning has made some significant advances, but the progress in unsupervised learning has caught my eye recently. I was academically birthed from the womb of a Frequentist, but the impact of deep Bayesian models cannot be ignored.

Netflix

Tech, data, and engineering from Frank Underwood & Co. at The Netflix Tech Blog (and Claire did break the fourth wall!).

Select post: Saving 13 Million Computational Minutes per Day with Flame Graphs

An on-going focus for the Netflix performance team is to proactively look for optimizations in our service and infrastructure tiers. It’s possible to save hundreds of thousands of dollars with just a few percentage points of improvement. Given that one of our largest workloads is primarily CPU-bound, we focused on collecting and analyzing CPU profiles in that tier.

Data science wordcloud

Oracle

The Oracle Data Mining Blog covers topics of interest mainly to users of Oracle Data Mining, a component of the Oracle Advanced Analytics Option, including news, opinions, and more.

Select post: My Favoriate Oracle Data Miner Demo Workflows - Part 1

Part 1 (of a planned series of blog posts): Here are a few of my favorite Oracle Data Miner demo workflows. They all are simple, easy to create examples of data mining and predictive analytics using Oracle Advanced Analytics and SQL Developer's Oracle Data Miner extension.

Quora

Engineering at Quora covers all sorts of engineering topics, and given Quora's model and obvious reliance on data science, they clearly have posts of interest to data science.

Select post: A Machine Learning Approach to Ranking Answers on Quora

Millions of people use Quora every day to find answers to their questions and make smarter decisions, find their dream jobs, raise their families better, and much more. It's really important for us to provide an excellent reading experience on our question pages. An important part of that is ordering answers to the question by their relevance and helpfulness such that most helpful answers show at the top.

Stitch Fix

The Stitch Fix blog Multithreaded includes entries on engineering, algorithms, mild philosophical rants, and data science.

Select post: Data Science at Stitch Fix

In an environment without project management tools, humans will communicate naturally. This creates a “survival of the fittest” environment for projects where only the important projects will make it into the queue to begin with.

Uber

The Uber Engineering Blog covers topics of interest to data science and the sharing economy.

Select post: Streamific, the Ingestible Service for Hadoop Big Data at Uber Engineering

While Uber moves people and packages around the world, data moves Uber. Systems like Hadoop and Spark power data decisions both large and small in the company. The Uber data engineering team builds big data solutions on top of these systems to support Uber’s growth.

Yhat

The Yhat Blog covers machine learning, data science, and engineering, and unlike a number of other company blogs, new posts are regularly published.

Select post: Interview with a Data Scientist Tool Developer

I interviewed one of the core members of the pandas Python Library Masaaki Horikoshi (sinhrks). I was really happy to interview him, and glad to show that Data-science and software development are really global things.

An Honorable Mention: OkCupid

OkTrends is practically legendary in the data science blog realm. The problem is, it hasn't been updated in over 18 months. Don't let the domain fool you: it really is a wealth of interesting data science case studies stretching back many years.

Blogs with Less Activity, Uncertain Futures:

Related: