Top /r/MachineLearning Posts, July: Friendly Suggestions re: Coding Practices; Racist AI How-To Without Really Trying
Why can't you guys comment your f*cking code?; Train Chrome's Trex character to play independently; How to make a racist AI without really trying; Is training a NN to mimic a closed-source library legal?; 37 Reasons why your NN is not working
The top 5 /r/MachineLearning posts of July are:
This is a salty rant-slash-inquiry as to why machine learning engineers and/or researchers and/or enthusiasts seem to not comment their code, write easy-to-follow code, or even use intuitive variable names. This is a generalization, for sure, but the rant has its merits.
However, not everyone agrees; the thread is filled with equally salty replies -- it *is* Reddit -- but many of these alternate viewpoints are also well-argued. Also, not everyone even agrees with the premise, so there's that. A fun read for when the kiddies are asleep.
This video pretty accurately sums this thread up:
You probably wouldn't want to... To be fair, it is billed as a cautionary tutorial.
The final step in this tutorial is:
- Behold the monstrosity that we have created
...which is not all that comforting. Noting the reality of where that leaves us, however, the authors go on:
And at that point we will have shown "how to make a racist AI without really trying". Of course that would be a terrible place to leave it, so afterward, we're going to:
- Measure the problem statistically, so we can recognize if we're solving it
- Improve the data to obtain a semantic model that's more accurate and less racist
Here is a direct link to the notebook tutorial. Have fun, and be responsible.
Here is the original argument/question which started the thread. Read it in its entirety yourself to get some wide-ranging feedback.
Let's imagine the following situation. John has access to the binaries of a closed source library that computes some nice image filtering, which means that he can apply it to any input image. Now he would like to get rid of the dependency on this library and have his own filter but does not have time to do all the research, so he trains a regression CNN (that produces filtered images) on a dataset which he creates by considering a lot of images, and - as a ground truth - the output of the library on such images.
Do you have any insight on where this stands from a legal point a view ? Is it considered as IP infringement, or maybe reverse-engineering ? Does it only depend on the license agreement of the closed-source library ?
Where do you start checking if your model is outputting garbage (for example predicting the mean of all outputs, or it has really poor accuracy)?
A network might not be training for a number of reasons. Over the course of many debugging sessions, I would often find myself doing the same checks. I’ve compiled my experience along with the best ideas around in this handy list. I hope they would be of use to you, too.
- Top /r/MachineLearning Posts, April: Why Momentum Really Works; Machine Learning with Scikit-Learn & TensorFlow
- Top /r/MachineLearning Posts, May: Deep Image Analogy; Stylized Facial Animations; Google Open Sources Sketch-RNN
- Top /r/MachineLearning Posts, June: NumPy Gets Funding; ML Cheat Sheets For All; Hot Dog or Not?!?