Interview: Ravi Iyer, Ranker on Why Crowdsourcing Needs Data Science

We discuss the dynamics of Ranker crowdsourcing platform, key factors for effectiveness, role of data science in crowdsourcing, and more.

ravi-iyerRavi Iyer occupies a unique space at the intersection of data science and moral psychology. He is the chief data scientist for Ranker, a consumer internet platform that collects millions of monthly consumer opinions, and the executive director of Civil Politics, a non-profit that uses technology to bridge the divide between practitioners and researchers in moral psychology. He is an applied data science consultant for Zenzi Communications, NewReleaseNow, the Institute for the Bio-Cultural Study of Religion, and Siemer & Associates.

He holds a PhD in Psychology from the University of Southern California and remains an active researcher, having published 20+ articles in leading peer-reviewed psychology journals over the past few years, most of which concern the empirical study of moral and political attitudes. His research has been featured in the Wall Street Journal, Reason Magazine, Good Magazine, the New York Times, and at numerous industry and academic conferences including South by Southwest. He blogs regularly about his research at

Here is my interview with him:

Anmol Rajpurohit: Q1. What does Ranker do? Can you elaborate on Ranker's endeavor to "answer subjective questions objectively"?

Ravi Iyer: ranker-logoRanker is the world's largest crowdsourcing platform, with over 20 million unique visitors each month voting and ranking items across domains and questions. Effectively, what Yelp does for questions about restaurants and TripAdvisor does for questions about hotels, we do for the remaining domains. Our goal is to provide data driven answers as, just like the combined opinions of Yelp users often is more valuable than any individual opinion from a food blogger or journalist, so too are crowdsourced answers better than the individual opinions of the writers who often currently dominate discussions of "what is the most anticipated movie of 2015?", "what is the most important life goal?", or "what is the best cure for a hangover?". If you believe in the math behind the wisdom of crowds, our answers are actually mathematically guaranteed to be of better quality, which is why our audience continues to grow.

AR: Q2. What factors are vital in making crowdsourcing effective i.e. to solve a problem through the wisdom of crowds?

crowdsourcingRI: Crowdsourcing requires attention to two issues. First, those who are being polled need to have some expertise in the question. So we can't reliably ask people for the best ways to perform a root canal, since most people have no expertise. Perfect expertise is not necessary, but some expertise is essential. Second, diversity is essential as for crowdsourcing to work, the "error" associated with each judgment needs to cancel out in aggregate. That won't happen if answers all tend to come from people with specific biases and characteristics, so effective crowdsourcing needs to take this into account.

AR: Q3. What role does Data Science and Analytics play in crowdsourcing?

crowdsourcing-data-scienceRI: All analyses have bias, and so crowdsourcing is core to data science. For example, cross-validation and techniques for weighting bootstrapped samples all take advantage of the wisdom of crowds, with "crowds" referring to groups of models specific. A broader view of these techniques is that all models have correlated error, as do all methods of data collection and sampling. Understanding how to aggregate intelligently across error sources is useful for answering any question, such that good data science knowledge correlates highly with being able to crowdsource answers effectively.

AR: Q4. Along with great insights, crowdsourcing involves a lot of bias. What are the common types of bias observed in crowdsourcing and how do you handle them?

RI: Most bias involves how the data is collected and from who is it collected from. For example, a lot of data involves positive signals (e.g. web visits, facebook likes, twitter mentions) without a corresponding negative signal. Figuring out how to collect/create those negative signals is often key to correcting that bias. The best way to correct for bias is to collect better data, which may involve sacrificing scale, but better data will almost always out-perform bigger data. For example, a poll of 1000 diverse people as to who will win the presidential election will always out-perform a poll of a million people who all are relatively similar. most-important-life-goals

AR: Q5. What have been some of the most unexpected insights that you derived from crowdsourcing data?

RI: My Ph.D. is in psychology and so I love collecting crowdsourced answers to questions that are generally not thought of as quantitative. So understanding people's simple pleasures or life goals, and the differences between ages, genders, and clusters in those domains interests me.

Second part of the interview