Top /r/DataScience Posts, November: Open source, Pokemon (?), Social analysis with R

November on /r/DataScience: is open sourced, Pokemon and Big Data games, a new social network analysis package for R, insider information on landing a Google Data Scientist job, and a free data science curriculum.

In November on /r/DataScience, we find out is free, distinguish between Pokemon and Big Data terminology, look at SocialMediaLab for R, learn how to get a job as a Google Data Scientist, and have a look at a free data science curriculum.

1. is Free and Open Source +70, the JavaScript library for scientific interactive charts, has been open-sourced and made freely available, via the MIT License. was initiated by founder Dr. Alex Johnson, and has been developed over the past 3 years. The source code can be found in this GitHub repo. Iris Graphic

2. Is it Pokemon or Big Data? +54

Next time you're bored on a Saturday night, consider giving America's favorite new game craze a try: Is it Pokemon or Big Data? This simple game presents the player with a term, and the player must decide if said term comes from the world of Big Data, or the world of Pokemon. Arbok? Pokemon! Arvados? Big Data! Gorbyss? Wait... what? Gorbyss? Good thing the game gives a quick explanation after you guess.

So why not brush up on Pokemon and data science while having fun with friends, family, and coworkers? This is sure to be a hit at holiday parties everywhere!

3. Introducing SocialMediaLab for R +56

Directly from the developers:

SocialMediaLab is an R package that provides a suite of tools for collecting and constructing networks from social media data. It provides easy-to-use functions for collecting data across popular platforms (Instagram, Facebook, Twitter, and YouTube) and generating different types of networks for analysis.

SocialMediaLab includes several helper scripts and tutorials for each of the 4 social media platforms it supports. There is also a handy beginner's guide, with an overview of the system, which is even accessible to non-programmers.

4. How to Get a Job at Google - As a data scientist +55

Next, The Unofficial Google Data Science Blog brings us insight on how to get a job at Google... as a data scientist. The advice is arranged under 5 categories, with the TL;DR being: know your stats, get real world experience, spend time coding, be passionate, and note that you have multiple options (as in, data scientists play many roles at Google).

The big take away that I come away with is that in-depth statistical and related mathematical knowledge may be the deciding factor with Google (all else being equal), given the emphasis this article places on it.

5. Free Full Data Science Curriculum +48

Someone has gone to a lot of trouble to bring a coherent curriculum of data science study to interested parties. The program of study is organized into 5 units and a capstone project spread over 12 weeks, with breakdowns by approximate time required. The material is made up of freely-available online resources, making use of Udacity, edX, Khan Academy, blog posts, and other sources. The units are: Probability and Statistics, Exploratory Data Analysis, Intro to R and Data Visualization, Data Wrangling, and Introduction to Analytics.

Someone's time well spent may be of use to others.

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.