Big Data Lessons from Microsoft “how-old” Experiment

Salil Mehta examines Microsoft’s viral “How old do I look?” site, the limits of its age recognition, possible algorithms, and implications for Big Data analysis.

In the age of “big data”, technology companies are positioning themselves to allow humans to more easily answer previously “inextricable” problems with interesting techniques.  And while there have been some silent successes -where advanced techniques have been largely embedded into organizations without fanfare- there have also been too many recent high-profile failures where companies sloppily lumber forward and collapse over their own laces.  Examples are from the academically gifted teams behind Google’s Flu Trends, facebook’s secret experiment, Apple’s map software23andMe’s genetics business, and Kensho’s Santa Claus rally call.  Maybe we should add Microsoft’s HowOldRobot onto this naughty list.  Their playful tool was enjoyed all over the world this month, but does it indeed create a better guess as to your age?  Here we seek to mathematically answer at what point should we reward only truly unique advancements and not what any probabilist can discern as a highly-profitable private data yield for the sake of an immature, random number generator.

Somewhat aware of this point, their product claims to just know “how old do I look?” from one’s face, instead of something more relevant, which is “how old am I?”  Clearly they haven’t a clue about the latter- and to be fair it is a nearly impossible task that few of us would rationally claim we can do.  Assuredly we also assume most people collectively look how old they actually are.  Else -for example- if all babies suddenly looked like senior citizens (think of the fabled Benjamin Button), then people would quickly adjust their perception of what a baby looks like (newborns simply would have the current “senior citizen” age look) and that appearance would then become what everyone associates back to a baby’s age instead (so we really shouldn’t have babies look the age of a “senior citizen”.)  The idea of not knowing one’s age, but instead guessing at how old they look, is filled with relevant follies, as we will soon see with actual output.

To begin, let’s see what the application looks like.  Contributing a snapshot of myself for science, one can see this application’s output in the upper-left of the four images below.  This is the best all of the “advanced” machine learning at Microsoft’s Bing could throw out there.  Their guess is wrong by more than a decade.  Or nearly a quarter of reality!  It’s as bad as if James Franco used the tool to defend his selection of an underage girl for goofing around: “well officer, she looks 21.”  Guesses like this, given any critical contexts, are simply well off the mark.

We’ll return to the other three pictures above, after first discussing some mathematical theory.  An important thing to here state is there are ways to link the work done by HowOldRobot to some quantitative understanding of the uncertainty behind these guesses.  Microsoft would have been better off providing us with the confidence interval range behind these, but such a crucial data would have been awful transparency for their own marketing!  Still, we can use probability theory to understand that at the core of this application they are seeking only a small number of relevant clues about the individual (sure they may claim to analyze dozens of attributes all at once as if this is some sort of fishing expedition, but the large explanations are in only a handful of pre-determined dimensions).  Despite all of the information exposed through a face, the science comes down to thinking about a straight image -in many cases- and what might be the gender or race of the individual.  It would also glean from the image properties, the random selection of colors and granularity provided to it -and combined with facial changes over time- provide an overall guess on age.  Such judgments would process -say- that a celebrated centenarian will be more wrinkled versus a baby.

For the face bone itself, the modification of this over time also interacts and varies as a function of age and gender.  Examining the differences in bone shape -and perhaps some of the more subtle connections within it- provides some information then to guess on the gender and age, and in particular where exactly to partition the data between young and old across genders.  See these well documented science articles that have been around, for details on these numbers: here, and here.

To try to understand whether the large errors on the self-portraits shown above are either good or bad, we should ask the logical question of what sort of estimate range would a blind monkey, throwing a dart at a conditional age distribution, come up with?  It turns out the monkey would be far closer, and so this is the standard by which technology companies should appreciate that we invariably measure the gravity of such a product launch.  We can stress now that these machined products may appear as charming toys today, somehow more advanced than traditional calculators they have previously relied on.  But these products have the quick potential to later leap into important life-and-death decision engines.  For example they could be used to pilot automated, drone-launched missile decisions, or driverless vehicles, or used to peruse a crowded street for a potential criminal.  Accuracy and precision have to matter, and so do understanding the egregiousness of errors when they occur.

Empirically this sort of probability problem concerning life age, can’t be theoretically solved in closed form.  The life tables that invariably drive this behind the scenes employ census information and are similar to a summation of actuarial tables.  Hence we must use logical information about the representative population of internet users and their likely photographed cohorts, which Microsoft is making age guesses for.  See our Tesla warrants article here for information on the difficulty in modeling customized actuary functions, even though it is certainly possible at times and under proper guidance.