7 Essential Resources & Tips To Get Started With Data Science

This instructional post takes you through connecting the various pieces when studying the data science pipeline. From analysis, to datasets, to MOOCs, to visualizing data, this informative post has some fresh insight.



5. Visualization

I've already mentioned the descriptive power of statistics. Let me illustrate the importance of visualization on one example, where simple statistics is not enough: Anscombe's quartet is a collection of four different data sets with two variables x and y. Interestingly, these data sets (despite looking very different visually) appear nearly the same through the lens of statistics. They share almost identical values of the following properties: mean of x, sample variance of x, mean of y, correlation between x and y, and linear regression line, yet in fact they're very dissimilar.

Anscombe's Quartet
Anscombe's Quartet.

Data visualization is important both when analyzing data and when conveying your findings. Human eyes and brain are great co-workers when it comes to recognition of patterns. They make it easy for us to immediately spot relationships, trends, outliers or anomalies in visualizations, especially for low-dimensional data. Whenever possible, you should try to leverage the enormous bandwidth of human's visual system and explain your data in graphical form. I'd recommend you to first get some inspiration in this amazing overview of visualizations based on D3.js library.


An animated visualization of cultural mobility in the world between 600 BC and present, revealing migration patterns of people. Animation is based on publication of M. Schich et al., with data extracted from publicly available Freebase knowledge base.

6. MOOC

Data science in various forms is being introduced as a new program on many universities around the world. Massive online courses go hand-in-hand with this trend and already you can find a plethora of free or very affordable courses that will guide you from Introduction Data Science, through Data Analysis and Statistical Inference, Data Mining or Data Visualization to Machine Learning lectured by Andrew Ng.

7. Challenges

Now, when you have all the pieces together, it's time to apply your knowledge in practice. And what can be more fun than participating in a competition? Data science challenges, such as Kaggle, are a great opportunity to test your own abilities and to learn from others (you'll also get nice data for free). On top of that, if you manage to win you can be offered a dream job or at least a lot of money. If that doesn't tickle your fancy, there is also another, more noble, reward in some competitions (e.g., DrivenData.org): saving the world!



I hope you have found these tips and resource useful, especially if you're starting your first data-related project. The field is evolving incredibly fast and new resources are popping up every day. Keep in mind that it's good to keep up with latest trends, but it's essential to learn the basics.

Bio: Jan R. Benetka, is curator of the ZEEF Data Science page and a PhD candidate at the Norwegian University of Science and Technology.

Original.

Related: