Top /r/DataScience Posts, October: Plagiarism, Reddit AMAs, Deep Learning Summer School

Plagiarism, a data science author's upcoming AMA, Deep Learning Summer School, essential tools for us all, and data scientist interview questions.

By Matthew Mayo.

November on /r/DataScience brings us posts on plagiarism, data science books and authors, Deep Learning Summer School, essential tools for us all, and data scientist interview questions.

1. Please Do Not Steal My Code, Mock My Analysis, and Present My Ideas as Your Own +54

This was originally a link to a blog post written by software engineer @minimaxir, detailing a data visualization project which he undertook, posted to Github under the MIT license, and then had plagiarized by another party for a school project. The post was definitely a rant, but also definitely made great points. A further lively discussion ensued here, with folks taking differing sides on the issue. The original blog post was later removed, but I was able to read it while it was still up and saw the third party in question post in the comments section, and so one can only speculate the reason the post was removed. It definitely raises questions about, and sheds some light on, code plagiarism, which is a legitimate concern.

Data Science Components

2. "Data Science for Business" Author Tom Fawcett AMA Interest? +45

This is a post from a third party guaging interest in an AMA with Tom Fawcett, author of the wildly successful book Data Science for Business. Interest seems plentiful, so the AMA appears to be in the works. If you're interested in other top books from Amazon, check out our Top 20 Data Mining and Top 20 AI & Machine Learning books lists.

3. 26 Things I Learned in the Deep Learning Summer School +40

A blog post from Marek Rei outlines a number of things he picked up during the Deep Learning Summer School in Montreal this past August. Among his covered points are motion tracking, Theano, and training on adversarial examples. His write-up is a nice overview of some of the important take-aways from DLSS. Incidentally, all of the seminar videos are online here.

4. 7 Tools in Every Data Scientist's Toolbox +36

This DataDive blog article focuses on some of the time-tested mainstays of the data scientist toolbox. Sure, there are a lot of newer techniques that are useful in specific applications, but this article focuses on topics such as tree-based methods, finding hidden groups with clustering, and random sampling methods as topics we should all have a handle on. The best thing about this article as that it is language- and tool-neutral, with ideas that apply universally. A good read.

5. Data Science Interview Questions +40

The last post we look at is one outlining a number of data science interview questions. The questions are organized by topics such as coding, distributed systems, and meta learning. Far from simply useful for these looking for new position, these questions are useful jump-off points for those looking to ensure they are professionally well-rounded, or looking for something new to study. Definitely worth a once-over.

Bio: Matthew Mayo is a computer science graduate student currently working on his thesis parallelizing machine learning algorithms. He is also a student of data mining, a data enthusiast, and an aspiring machine learning scientist.