Interesting Things Learned as a Student of Machine Learning

Did you ever learn something you didn't really want to? The path to machine learning mastery is paved with such collateral knowledge. Here are a few examples of such information I have gleaned while trekking away.


If you have spent time either formally or informally learning machine learning, no doubt that you have been amazed at some point both at what machine learning is capable of, and your ability to learn it. At the same time, I was somewhat upset to learn that machine learning is, in fact, not magic, and actually based on sound principles. But that is beside the point.

The student of machine learning can often also glean all sorts of tangential knowledge when studying and practicing, the "collateral knowledge," if you will. For your consideration, here are a few things I've picked up along the way, which I assuredly would not know if not for hours spent exploring and applying the craft of machine learning. I have a feeling that there are many others in the same boat as am I.

  • Apparently, and to my utter surprise, humans have an estimated 2% error rate in identifying street numbers on houses. Visible, legible numbers. Blows my mind.
  • The handwriting of a fair number of humans vis-a-vis digits is terrifyingly poor.
  • The mean sepal length of an iris is 5.843 cm.
  • If it's overcast, 81 degrees Fahrenheit with 75% humidity, playing tennis is deemed acceptable, unless, of course, it is windy.
  • The mean petal width of an iris is 1.199 cm.
  • There is an entire industry predicated on an anecdote of stocking up on beer and diapers as the weekend approaches.
  • Portuguese forest fire areas are best modeled on a logarithm scale.
  • The Pima People are a group of Native Americans living in an area consisting of what is now central and southern Arizona.
  • You can measure wine quality, or you can measure wine chemical makeup, but just be sure you know which you are observing at any given time (double-tangentially, "wine" alone is not a sufficient name for a dataset).
  • 50K per year is apparently a well-accepted over/under salary split.
  • Determining the age of abalone the "regular" way is "a boring and time-consuming task."

I have not cited anything here; either you already know what I'm talking about, or you might have a laugh or 2 figuring out what it is that I'm talking about. Feel free to share your own collateral knowledge in the comments below.

For no reason whatsoever, I leave you with this non-sequitur gem from Kevin Gray:

Accurately predicting extremely rare events, such as fraud or disease - most diseases are rare - is actually quite easy. All you have to do is predict that the event won't happen, and you'll be 99.99% accurate. :-)

Happy machine learning!