Top /r/MachineLearning Posts, July: Friendly Suggestions re: Coding Practices; Racist AI How-To Without Really Trying

Why can't you guys comment your f*cking code?; Train Chrome's Trex character to play independently; How to make a racist AI without really trying; Is training a NN to mimic a closed-source library legal?; 37 Reasons why your NN is not working


The top 5 /r/MachineLearning posts of July are:

1. Why can't you guys comment your f*cking code?

This is a salty rant-slash-inquiry as to why machine learning engineers and/or researchers and/or enthusiasts seem to not comment their code, write easy-to-follow code, or even use intuitive variable names. This is a generalization, for sure, but the rant has its merits.

However, not everyone agrees; the thread is filled with equally salty replies -- it *is* Reddit -- but many of these alternate viewpoints are also well-argued. Also, not everyone even agrees with the premise, so there's that. A fun read for when the kiddies are asleep.

2. In this project I tried to train Chrome's Trex character to learn to play by looking my gameplay (Supervised)

This video pretty accurately sums this thread up:

Go here for the Github repository.

3. How to make a racist AI without really trying

You probably wouldn't want to... To be fair, it is billed as a cautionary tutorial.

The final step in this tutorial is:

  • Behold the monstrosity that we have created

...which is not all that comforting. Noting the reality of where that leaves us, however, the authors go on:

And at that point we will have shown "how to make a racist AI without really trying". Of course that would be a terrible place to leave it, so afterward, we're going to:

  • Measure the problem statistically, so we can recognize if we're solving it
  • Improve the data to obtain a semantic model that's more accurate and less racist

Here is a direct link to the notebook tutorial. Have fun, and be responsible.

4. Is training a NN to mimic a closed-source library legal?

Here is the original argument/question which started the thread. Read it in its entirety yourself to get some wide-ranging feedback.

Let's imagine the following situation. John has access to the binaries of a closed source library that computes some nice image filtering, which means that he can apply it to any input image. Now he would like to get rid of the dependency on this library and have his own filter but does not have time to do all the research, so he trains a regression CNN (that produces filtered images) on a dataset which he creates by considering a lot of images, and - as a ground truth - the output of the library on such images.

Do you have any insight on where this stands from a legal point a view ? Is it considered as IP infringement, or maybe reverse-engineering ? Does it only depend on the license agreement of the closed-source library ?

5. 37 Reasons why your NN is not working

37 reasons

Where do you start checking if your model is outputting garbage (for example predicting the mean of all outputs, or it has really poor accuracy)?

A network might not be training for a number of reasons. Over the course of many debugging sessions, I would often find myself doing the same checks. I’ve compiled my experience along with the best ideas around in this handy list. I hope they would be of use to you, too.