Big Data Lessons from Microsoft “how-old” Experiment
Salil Mehta examines Microsoft’s viral “How old do I look?” site, the limits of its age recognition, possible algorithms, and implications for Big Data analysis.
We noticed earlier that the conditional gender probability is not equal. Notice the two upper distributions in the four-some further above. Notice the blue colored male distribution and how for a main segment it must often compete in the middle age categories, with the larger female populations (in red and pink). This only means that on the margins, the “advanced” robot can confuse a middle-age man with a female. Despite the current cultural focus on make-up and beauty, we can see from the information supplied above, that female faces ages faster over their lengthier life-spans. And the false-positives exist in both directions.
Let’s return now to the popular news above, concerning the wrong-doings of James Franco. How did Microsoft’s age-guessing application even work on him?
Actually, do you mean her? That’s right, blame it all on a sloppy trumpeting of “machine learning”, but Microsoft pegs James Franco at 8 years older than reality (which is relatively a lot) and also has James guessed to be one hirsute woman. This should not fit nicely into their disclaimer about “not getting the age and gender quite right”.
To see what the mathematics is of what should be going on, see this blue image below from a scientific capture done on the author’s face. Notice that the statistical characteristics assumed to go into a complicated, high-resolution face (shown in our self-portrait at the top of this article) really only boil down to a small number of independent factors we literally summarize below. And for age it might be a little less since we are only taking billions of unique people and asking to boil that down to just more than 100 discrete integer ages, not identify unique people in -say- a photomontage.
Some may know that this is similar to a popular idea to what was created by a 20th-century Harvard statistician’s, self-named Chernoff’s faces. Unlike changing markers or lines on a graph, the complex face can capture and communicate cues that serve as feature variables. A human might see Chernoff faces in any physician’s office, where only a single dimension is used to express happy through sad/pain. By simply altering the value of the mouth and eyes, along the illustrations, and the patient pointing to the facial expression they most relate to. We can see below that an adjustment can also be made in a small number of other ways, say to the length and width of the face, as another measure to communicate. Note that touting the transformation of a high-dimension, color imagery that exists today is overboard when the Microsoft age-guessing errors are this bad.
Returning now to the above scientific computer capture of the author, in blue, similar to a screen the government may capture behind the scenes in a public venue. The properties here could have the potential of a binary indicator value at each point along the image, reducing all of the easy to understand face information to quick security print in the magnitude of ~2(40*50) combinations. Unfortunately this is still too much information and too costly to store and retrieve. So instead the science migrates to a lower magnitude, excluding a portion of the eyeglass portion of the picture above, of roughly ~2100. And the compacted number of eigen principals behind this author’s face (or any other) is even lower still, which is why contortions to the self-portrait above don’t change the age guess much as so little weight is being placed on most of the dozen or so variables that are truly minor in the end. One of the weaknesses of the machine learning approach is that it assumes that every data set has the capability to be addressed meaningfully through it’s process. Sometimes the products are still far off from being an important tool, one that the underlying science and math hasn’t been well recognized for, and won’t be unless further technological and mathematical advancements can be made.
Imagine the whimsical and brute force nature of mixing together Play-Doh, which comes with the individual colors (standing in for ages) segregated into different containers. Anyone can mathematically present a color model at he time of purchase, where each container equals separates and explains a single color. Put a few colors side-by-side, and a traditional linear model makes sense still (similar to a magician using parallel and linear blades to saw through a volunteer laying in a box). Now we will think about things more complicated representative of real world data. Say the elegance of the Chinese philosophical symbol 太極圖 (known as yin and yang in modern culture). Intermediate mathematicians can still come up with a more sophisticated mathematical expression to bisect the two opposing colors there.
But imagine a more complicated, and somewhat spurious, kneading of the different Play-Doh colors. One can then have a final assembly of a product that is merely too complicated to model the colors from the amalgamation, even as the compact exterior mold looks seductively benign.
It took modern artist Jeff Koons two decades to put this together. Twenty years is greater than the standard error of some of the subdistribution life estimates above! What linear traditional mathematical model could pierce through the above stunning, multi-dimensional Play-Doh expression? As an advanced probabilist, it is easy to conclude there is none. And it is improbable that we can use sophisticated machine learning and big data to finely explain the location of each of the like colors, as if they were from a single continuous, non-porous unit. These are life limitations that scientists and business people can run into trouble if they chase too hard against a bad problem.
Big data algorithms don’t care about mathematical accuracy as they do with their only strength of fast displays of any summary data. They glamorize too quickly as their estimation of random art and too glacial to express the additional expense and errors in the value they presume to create by following their procedures. They would have you believe that every museum exhibit, concerning something similar with Play-Doh, would look similar. That there is a mathematical logic and meaning to almost everything. Put differently, their interpretation of the artistic and multi-dimensional model above is that every color is in a certain place for a reason, similar to the fundamental, ex ante logic we had for the colors in the population distribution manufactured earlier in this article. But this is a run-away fantasy that simply doesn’t work.
Even as Microsoft’s model aims (and successfully does to a partial though insufficient degree) to reduce the variability in age guessing, from a universal set of the population, we showed here that the conditional volatility allows for heterogeneous errors in large, pre-defined segments of the population at any point in time. This makes to any reasonable quantitative person, the Microsoft product oddly fail versus how it is advertised. It also completely fails in different ways as could have been generally been predicted, and presents a definite, permanent setback and visible weakness of machine-learning algorithms in their instability to be rolled-out broadly. The product failed unaccountably (by Microsoft anyway) with this author. And with Andy Warhol and Marilyn Monroe. And with James Franco. Social media is populated with other cases of breakdown.
In the final analysis, it will always be wiser counsel for companies esteemed as Google and Apple and Microsoft, to operate and promote within their confines of what’s possible, given the technical and resource gaps that still exist. Given the large-stakes risks, gaffes and secret data solicitation should be avoidable, in order to secure the public’s trust. They should have known this product is merely a pleasure tool, and broadly acknowledge the incapability of its precision to happen in at least this innovation cycle. Instead they expose the gimmick of how sensitive these seemingly advanced products are to wild errors, and leave a sensible public perplexed.
We noticed as well, both here and generally in life, that faces are extraordinarily beautiful and complex. As an artist they are highly difficult to even draw and explain. Clearly probability and statistics have a place in cracking the riddle behind how they work. One day we might wrest control -to a robot- of quick and life-threatening decisions anywhere in the world. These errors will no longer be a source of pleasure, but rather imply real lives were continuously sacrificed.
Right now you wouldn’t want a monkey as a TSA agent, on guard to check airline passengers ID and highlight suspicion. We could all appreciate the nuisance and aggravation caused by repeated errors and loop-holes in code. Statistical false-positives (people routinely being inconvenienced) and false-negatives (threats that always go undetected) are both frequent, costly, and will lead to hazardous vulnerabilities. Unfortunately, we operate in a commercial world where Watson and Deep Blue are forced only through brute force (not through something advanced and clever) to stay ahead of humans, but then are advertised as proof that technology companies today can easily solve everything important. Such as strangely (and quickly) stating illogical things that no human alive could: falsely concluding with all of Microsoft’s technical muscle, that James Franco is actually an older woman.
Bio: Salil Mehta is a top-selling mathematics and statistics book author. Academic statistician, C-suite advisor, and risk strategist. He has over 17 years of experience, of which a dozen years were on Wall Street, performing proprietary trading and economic research for firms such as Salomon/Citigroup, and Morgan Stanley.
Related:
- Does Deep Learning Have Deep Flaws?
- 10 reasons why how-old.net went viral and how does it work?
- Top KDnuggets tweets, May 4-11: Why #HowOldRobot went viral and how does it work? 3 Things About #DataScience You Won’t Find In Books