Doing Data Science at Twitter
Data scientist career exciting, fulfilling and multidimensional career path. Understand through the journey of a data scientist of twitter about data scientists roles, responsibilities and skills required to perform them.
By Robert Chang, @_rchang.
(All opinions below are Robert Chang’s, and not of Twitter).
On June 17, 2015, I celebrated my two year #Twitterversary @Twitter. Looking back, the Data Science (short for DS) landscape at Twitter has shifted quite a bit:
- Machine Learning has played an increasingly prominent role across many core Twitter products that were previously not ML driven (e.g. “While you are away”)
- Tool wise, we’ve moved away from Pig and all new data pipelines are now written in Scalding, a Scala DSL built on top of cascading that makes it easy to specify Hadoop MapReduce jobs
- Organizationally, we switched to an embedded model where DS are now working closer than ever with the product/engineering teams
And these are only a handful of changes among many others! On a personal note, I’ve recently branched out from Growth to PIE (Product, Instrumentation, and Experimentation) to work on the statistical methodologies of our home grown A/B Testing platform.
Being at Twitter is truly exciting, because it allows me to observe and learn, first hand, how a major technology company leverages data and DS to create competitive edges.
Meanwhile, demands and desires to do data science continued to skyrocket.
“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it” — Dan Ariely
There are many, and I mean many, discussions around how to become a data scientist. While these discussions are extremely informative (I am one of the beneficiaries), they tend to over-emphasize on techniques, tools, and skill-sets. In my opinion, it is equally important for aspiring Data Scientists to know what it is really like to work as a DS in practice.
As a result, as I hit my two year mark at Twitter, I want to use this reflection as an opportunity to share my personal experience, in the hope that others in the field would do the same!
Type A Data Scientist v.s. Type B Data Scientist
Before Twitter, I got the impression that all DS need to be unicorns — from Math/Stat, CS/ML/Algorithms, to data viz. In addition to technical skills, writing and communication skills are crucial, and one needs to know how to prioritize, lead, and manage projects. Oh, you also need to the voice of reason and a data evangelizer! Good luck?
A few months in into my job, I learned that while unicorns do exist, for the majority of us who are still trying to get there, it is unrealistic/infeasible to do all these things at once. That said, almost everything data related is tied to the term DS, and it was a bit daunting to find my place as a newbie.
Overtime, I realized that there is a overly simplified but sufficiently accurate dichotomy of the different types of Data Scientists. I wasn’t able to articulate this well until I came across a Quora answer from Michael Hochster, who elegantly summarized this point. In his words:
Type A Data Scientist: The A is for Analysis. This type is primarily concerned with making sense of data or working with it in a fairly static way. The Type A Data Scientist is very similar to a statistician (and may be one) but knows all the practical details of working with data that aren’t taught in the statistics curriculum: data cleaning, methods for dealing with very large data sets, visualization, deep knowledge of a particular domain, writing well about data, and so on.
Type B Data Scientist: The B is for Building. Type B Data Scientists share some statistical background with Type A, but they are also very strong coders and may be trained software engineers. The Type B Data Scientist is mainly interested in using data “in production.” They build models which interact with users, often serving recommendations (products, people you may know, ads, movies, search results).
I wish I had known this earlier. In fact, as an aspiring DS, it is very useful to keep this distinction in mind as you make career decisions and choices.
Personally, my background is in Math, Operations Research, and Statistics. I identified myself mainly as a Type A Data Scientist, but I also really enjoy Type B projects that involved more engineering!