Solon Barocas, a PhD student at NYU, a speaker at Strata conference, discussed the perceptions of data mining and how companies can address data mining's reputation.
O'Reilly, by Audrey Watters, 1 November 2011
If your data practices were made public, would you be nervous?
The practice of data mining often elicits a knee-jerk reaction from consumers, with some viewing it as as a violation of their privacy. In a recent interview,
a doctoral student at New York University, discussed the perceptions of data mining and how companies can address data mining's reputation.
Highlights from the interview (below) included:
The full interview is available at
- What do consumers think data mining entails? "Data mining almost intuitively for most consumers implies scavenging through the data, trying to find secrets that you don't necessarily want people to know," Barocas said. "I think of it ... to be a particular form of machine learning. A challenge for people in the industry, regulators, ... is to figure out a way to communicate these technical things to a lay audience."
- Do we need a different phrase in lieu of "data-mining"? Barocas argued: "[We should] try to push back against the misuses of the term, re-appropriate the term data mining, and explain it's not 'data-dredging.' It's not this case of running through everyone's data. We need to instead explain data mining is a kind of analysis that lets us discover interesting and important new trends. I think there's an enormous amount of value in data mining and being able to explain precisely what that value is without making it seem like it's just snooping." [Discussed at
- What "ethical red flags" should companies and data scientists be aware of? "There are potential problems all along the line," said Barocas, as after all, it can be difficult for companies performing analysis to know what to collect and what not to collect.
"The rule of thumb: If your practice was made public - widely public - would you be nervous?"
Barocas said he realizes that's "not a very sophisticated rule," but it's one that might guide responsibility in the data mining space. [Discussed at