KDnuggets Home » News » 2015 » Apr » Opinions, Interviews, Reports » Interview: Ravi Iyer, Ranker on Dealing with Inherent Bias in Crowdsourcing Data ( 15:n11 )

Interview: Ravi Iyer, Ranker on Dealing with Inherent Bias in Crowdsourcing Data

We discuss the challenges of analyzing crowdsourcing data, tools and technologies, competitive landscape, advice, trends, and more.

ravi-iyerRavi Iyer occupies a unique space at the intersection of data science and moral psychology. He is the chief data scientist for Ranker, a consumer internet platform that collects millions of monthly consumer opinions, and the executive director of Civil Politics, a non-profit that uses technology to bridge the divide between practitioners and researchers in moral psychology. He is an applied data science consultant for Zenzi Communications, NewReleaseNow, the Institute for the Bio-Cultural Study of Religion, and Siemer & Associates.

He holds a PhD in Psychology from the University of Southern California and remains an active researcher, having published 20+ articles in leading peer-reviewed psychology journals over the past few years, most of which concern the empirical study of moral and political attitudes. His research has been featured in the Wall Street Journal, Reason Magazine, Good Magazine, the New York Times, and at numerous industry and academic conferences including South by Southwest. He blogs regularly about his research at PoliPsych.com.

First part of interview

Here is second and last part of my interview with him:

Anmol Rajpurohit: Q6. What are the biggest challenges of working with crowdsourcing data? What tools and technologies do you use to overcome those challenges?

Ravi Iyer: Dealing with agenda pushing and biased samples is always an issue. Fortunately, human beings are really bad at faking organic behavior, so there is almost always a way to discover such anomalous patterns of data.

crowdsourcing-challengesWe use the same data processing tools that a lot of companies use to help scale analyses (MongoDB, Hadoop), but the most valuable tools are those that enable us to add new data to our system. For example, we get data from Oracle/BlueKai that lets us examine patterns in different user groups (e.g. men vs. women) separately to test for the robustness of our rankings. We are also increasingly merging our opinion graph data (e.g. people who think this movie is overrated, think these burgers are tasty) with graph data from Facebook in order to create even more robust and powerful insights by triangulating across datasets. There are so many analysis tools that they are somewhat a commodity, and we are finding that it is the ability to add new data that differentiates the value that we can bring, not the different ways we can analyze that data.

AR: Q7. How do you distinguish Ranker from other crowdsourcing portals such as Quora and Wikia?

ranker-missionRI: Our answers are purely quantitative. So while someone on those platforms may give a good answer as to the most important life goals, you won't be able to rate the specific items within their answer. By breaking things down into their constituent parts, we can provide far more granularity as to such answers. It also allows us to collect great data as far as predicting what other items on a list will interest people, given their existing opinions. Qualitative answers have a place, but there are very different things you can do with purely quantitative polling.

AR: Q8. What is the best advice you have got in your career? advice

RI: Nobody ever succeeds doing something they don't love. So I have constantly moved to answer questions that interest me intrinsically, sacrificing short-term gains for longer-term happiness. Usually that leads to longer term career success as well.