by Edd Dumbill, 2 September 2010
The community Q&A site Quora is rich with information about data science, analytics and computing. An especially illuminating answer was given this week to the question How do I become a data scientist - how does someone with a computer science background get the math and statistics knowledge required for data science?
Providing an extensive reply, Alex Kamil gives eight points from his perspective as an undergraduate student. Many of these reference statistics and math, and Kamil provides an excellent list of papers, websites and technologies to tinker with.
Several of Kamil's suggested starting points struck me as common themes among those who define themselves as data scientists:
- Start learning statistics by coding with R: whatever the size of the data you're working with, many data analysts perform and prototype investigations using the R language. Some will later translate these into larger map-reduce jobs to be run on Hadoop, for instance. R provides a hands-on way for developers to teach themselves statistics in practice.
- Linear algebra: a grounding in linear algebra is common among many data scientists, and important because matrix math underpins many data mining applications, such as the famous PageRank.
- Machine learning: allowing computers to alter behavior based on input data is fundamental to many innovative data-based products and services. Many developers start this ad-hoc, but there is much available literature. Kamil references Bradford Cross' extensive list of machine learning resources.
The field of data science is a place where book learning meets code and produces results. In the words of Kurt Lewin: "There's nothing so practical as a good theory."