Interview: Josh Hemann, Activision on Why the Tolerance for Ambiguity is Vital
We discuss handling bias in data, other data quality concerns, advice, desired qualities, and more.
Josh Hemann is the Director of Analytic Services at Activision where his team builds data tools to support video game studios and embed analytics within the games they create.
Prior to this his industry experience spanned diverse settings such as air pollution research, aerospace, retail loyalty programs and recommendation systems for grocers.
Josh has an MS in Applied Mathematics from the University of Colorado at Boulder.
First part of the interview
Here is second and last part of my interview with him:
Anmol Rajpurohit: Q7. One of the most common issues is with data is bias. How do you identify and mitigate bias?
I am reminded of the quote by John Tukey, "The best thing about being a statistician is that you get to play in everyone else's backyard." Identifying bias requires knowing something about the backyard you are in, being connected with the domain experts you are working with.
So you have to understand the data generating process, whether it’s by a machine or by nature, but you also have to understand how data is actually measured and persisted. In my work, it is the measurement and persisting steps that often introduce the most insidious biases.
AR: Q8. Besides bias, what are the other data quality issues you observe very often?
JH: One example is the numerical conversions that happen when persisting data generated by a game into a relational database. A given variable instrumented in the game might take on only positive values and be written to a binary stream as 10-bits wide. But when this variable is persisted in a database it could accidentally be written as an 8-bit signed integer, leading to rollover. Application logic that assumes only positive values would either miss the negative-valued records or possibly break. We have thousands of such variables tracked in the game code so invariably bugs like this exist.
AR: Q9. What do you mean by "never show a picture without a statistic, never show a statistic without a picture"?
JH: This is advice I got years ago from Dr. Jeffrey Luftig while taking a statistics course from him. It means that it is easy to be misled by a single statistic or a pretty plot. We need data visualizations to validate the inferences we draw from statistics and likewise, we need statistics to validate the natural inferences we make when looking at data visualizations. I recently wrote about this topic here.
AR: Q10. What is the best advice you have got in your career?
JH: No one has told me this advice directly, but I have learned it by watching how decisions get made across multiple organizations and settings: we are ultimately emotional animals and I try to remember that when confronted with too much pretense about being “data driven” and “objective”. It’s one thing to do analysis and present results, and quite another to get people in complex organizations to actually change what they are doing, or make some discrete decision differently than they would have without the analysis. And what is being analyzed in the first place connotes a lot about what an organization values and chooses to spend time and money on.
There is a lot of building of trust and quite frankly, selling, that is necessary to do analytical work. The results never simply speak for themselves, as I once thought they did.
AR: Q11. What key qualities do you look for when interviewing candidates for positions on your team?
Beyond being skilled in a particular area, I try to assess people’s tolerance for ambiguity and willingness to tackle poorly defined problems.
At least in our current world that tolerance needs to be high because these games are so complex, take hundreds of people multiple years to develop, and operate at huge scale. Mapping a business goal to statistical code that has to be executed in game operations can be horribly frustrating at times and you have to have a personality that can be OK with that and continue to progress.
AR: Q12. What are your favorite books and other resources on visualization? What do you like about them?
JH: For statistical visualization my favorite is William Cleveland’s The Elements of Graphing Data. He covers prescriptions for presenting data that are grounded in studies on how people perceive visual cues.
There is an old book titled Fundamentals of Interactive Computer Graphics by J.D. Foley and A. Van Dam that I don’t turn to much lately but was important when I was first doing this type of work. It covers lots of, well, fundamentals about coordinate transformations, view clipping, rendering 3D objects, etc. I find it helpful that I have some familiarity with these topics when I run across odd behavior in some software tool or an obtuse API in a graphics library, because I have an intuition for what must be going on behind the scenes regardless of how much the particular tool abstracts away the complexities.
I’d be remiss to not mention a couple of web resources. Kaiser Fung’s blog, Junk Charts, covers great examples of what not to do and often includes examples of better alternatives. I can stare at Mike Bostock’s gallery of D3 charts all day. And organizations like the New York Times and the Washington Post are doing amazing work layering in interactive data visualizations with news stories.
AR: Q13. On a personal note, what keeps you busy when you are away from work?
JH: I go through periods of frequent travel to our various studios, so when I am home I try to spend a lot of time with my wife and two daughters, especially playing in the great outdoors surrounding Boulder, Colorado. I like working with my hands, so fixing cars or building things for my kids are ways I decompress.