Data Analytics Handbook – Interviews with Data Scientists and Tech Leaders, free download

A young team of UC Berkeley students produced a Data Analytics Handbook - free to download, featuring interviews with data scientists and tech leaders from leading companies including LinkedIn, Cloudera, Facebook, Yelp, and Flurry.

By Gregory Piatetsky, Apr 12, 2014.

A young team of 3 UC Berkeley students (Brian Liou, Tristan Tao, and Elizabeth Lin) has produced a Data Analytics Handbook which includes interviews with data scientists and tech leaders available at

Data Analytics Handbook - free download.

Part 1 includes interviews with Data Scientists from LinkedIn, Cloudera, Facebook, Yelp, HG Data, and Flurry.

Top takeaways include:

1. Communication skills are underrated
If you can't present your analysis into digestible concepts for your CEO to understand, your analysis is only useful to yourself.

2. The biggest challenge for a data analyst isn't modeling, it's cleaning and collecting
Data analysts spend most of their time collecting and cleaning the data required for analysis. Answering questions like "where do you collect the data?", "how do you collect the data?", and "how should you clean the data?", require much more time than the actual analysis itself.

3. A Data Scientist is better at statistics than a software engineer and better at software engineering than a statistician
The greatest difference between a data scientist and a data analyst is the understanding of computer science and conducting analysis with data at scale. Data scientists only need a basic competency in statistics and computer science and not all are Ph.Ds. New tools are empowering more people to do data science.

Part 2 includes interviews with CEOs and Managers from Mode Analytics, Smarter Remarketer, Cloudera, Stylitics, Flurry, Yhat, Persontyle, and BigML.

Top Takeaways include:

3. Do your own projects to break into the industry.
The truth is, even in a quantitative major you are not taught what you need to know to work in data analytics. There is a learning gap between academia and industry that is best filled by doing projects. Find some sports statistics and do your own analysis. Learn R so that you can complete this analysis, not just to learn R itself. Also try Kaggle.

4. Statistics > Programming.
The development of tools and popularity of programmers has caused black box statistical analysis usage. Understanding selection bias vs. sampling bias and the underlying assumptions to which statistical functions are built on will make your opinions matter and your work invaluable.

5. The most important skill is being able to ask the right questions.
The power of data analytics is in taking open response questions and framing them to be multiple choice. Therefore if you have the ability to filter a million questions into options A through D, you are a data scientist for hire.

Here is Part 3 of the handbook which includes interviews with Academics and Research Leaders, including Hal Varian (Chief Economist, Google), Tom Davenport (Professor, Babson College) and me - Gregory Piatetsky (Editor, KDnuggets).