Are Big Data and Privacy at odds? FICO Interview
We discuss privacy, FICO scores, balancing predictive power and non-discrimination, whether technology bringing big data and privacy closer, and most important privacy issues for FICO.
FICO is a leader in credit scoring and applying predictive analytics to business and fraud detection.
However, increasingly more accurate predictions of consumer behaviour also significantly erode consumer privacy. Witness the uproar when Target was able to predict a young woman pregnancy before her father knew.
I recently had a chance to discuss Big Data and Privacy with Andrew Jennings, FICO chief analytics officer and head of FICO Labs - here is the interview.
Gregory Piatetsky: People leave so many digital trails it is becoming harder to keep privacy. (One example: Facebook "Deepface" face recognition has been reported to have >97% accuracy!). Will people get used to reduced privacy or do you expect government regulations/society backlash that will force companies to stick to more privacy?
Andrew Jennings: Both! The reason privacy is such a hot button today is that we have truly entered a new era of what you might call human digital evolution. There is much greater ability today to use analytics to understand people’s behavior, and those analytics are fed by both traditional commercial data streams, such as the information at credit bureaus, and new data streams, such as social media. Information has been made digital with the Internet. Physical objects have been “digitized” and linked with the Internet of Things. Increasingly, people’s behavior, needs and attributes are also being digitized and linked by complex algorithms.
This represents a big change, and like all big changes it causes friction. People are already used to an increased level of data sharing and availability, particularly Millennials, who have grown up at the center of their own media universes, and share nearly every aspect of their lives with an online “audience.” However, many people will never want to give up any of their privacy, and these people may cause governments to enact protective regulations.
What this means is that the next several years will be a roller-coaster for privacy acceptance and regulation. For every new use of data that seems cool, there’s another use that seems creepy. The recent NSA revelations, for example, sent a shock wave worldwide. But for every reaction and new regulation, there will be additional steps that make data freer. That’s because human beings are social animals who naturally see the benefits of sharing their data. You only need to look at the global growth of social media to realize that.
And, of course, nearly every day you can read headlines that illustrate the potential downside of data availability. That includes identity theft and massive breaches of data that can lead to financial losses and widespread distrust of the systems that are supposed to secure the information that people share. These issues require appropriate regulation and vigorous enforcement – and are part of the privacy-analytics dynamic.
GP: Europe has much stricter privacy rules than US - what is the effect on global companies? Is there a privacy scale for different countries/regions?
AJ: When it comes to privacy, there really is no such thing as global. Notions of privacy are inherently cultural. So even large organizations that want to think globally must act very, very locally if they are to avoid a backlash. Some countries will always be more restrictive because their people collectively have different attitudes toward privacy.
GP: FICO score is a factor which is used for credit risk and mortgage decisions. Such score should use the best predictive information, but at the same time there are US government regulations that prevent lenders from denying loans based on certain personal information such as race, religion, or marital status, which may be predictive.
How do you balance predictive power and non-discrimination?
What about fields that can be proxies for "prohibited personal information" fields (eg certain first/last names can indicate race or religion).
AJ: It’s a good question, and one that FICO has wrestled with for nearly 50 years. The short answer is that you build the best models you can with the data you can use. And you focus rigorously on the task at hand — for a FICO Score, building a model that will produce the best risk assessment for every consumer. We don’t use proxies for prohibited characteristics because we have no interest in trying to circumvent the law. There is plenty of predictive power in the credit bureau data, if you’re clever enough to decode it. After 50 years of building credit risk models, we know how to get the maximum power from the data.
There are other areas where use of demographic data – for example, gender -- is allowed in specific areas such as retail marketing. It may improve marketing results to target certain products at men or women. Some people may not be happy about that. But it allows people to obtain more relevant and useful offers and service. That’s the key thing.
GP: Big Data and Privacy are fundamentally at odds. Are there technological /societal solutions that enable better marketing/decisioning while letting people keep (some) privacy?
AJ: Responsible data scientists do use analytic technology in a way that meets society’s needs to protect privacy. For example, data scientists often work with “anonymized” data that can’t be traced back to a specific individual, but which still enables them to build analytics that detect specific patterns. In fact, if you use the data you have well enough, you can sometimes avoid acquiring additional data. For example, lenders have long used credit scoring in order to limit the amount of data they need to collect from a borrower applying for a loan.
You can adapt your analytics and decision-management systems to whatever data is available, so of course you can make decisions without scouring all available data. That said, withholding data generally will lead to poorer decisions. So better decision-making will always tend to utilize more data, or at least better analysis of the existing data.
I don’t see Big Data and privacy as completely antithetical. Our society’s notions of privacy aren’t fixed – they will continue to evolve, not just here but around the world. People have generally been willing to exchange information for services, such as when you apply for a loan or a new cell phone contract. The challenge comes when people learn that information they exchanged with one party for one purpose is being used by other parties for other purposes. Society and business will grapple with this for the next few years, because this is the way data is often used in a Big Data world. For example, President Obama has asked White House counsel John Podesta to produce a report on Big Data and privacy.
GP: What is your opinion of personal data marketplaces? (for example Datacoup start-up gives people $8/month in exchange for access to their social media information.)
AJ: I expect to see more of this trend. People realize there’s value in their information, and they want to know how they can get a piece of the action. Similarly, there is an emerging group we call the “data volunteers” who will exchange their own data with businesses in order to receive better service. For example, they will “train” the Amazon search engine by editing their recommendation list, hoping it will truly do their shopping for them.
We are really still at the early stages of people figuring out how this explosion in Big Data analytics can benefit them. There has been a lot of professed benefit, but we need to continue moving from professed benefit to more real benefit. People today don’t always see how they are getting benefit from analysis of their data, behaviour and attitudes.
GP: What is the most important privacy issue for FICO?
AJ: For us, the most important privacy issue is the need to take a balanced approach when imposing rules on the use of personal data. This requires careful deliberation, and it’s a dialogue FICO has long participated in. It’s often the case that regulators think they are striking a blow for the public by restricting data use, only to find that those restrictions put people and businesses at a disadvantage. Used well, predictive analytics can give people more choices, greater protection from financial crime and other benefits, while still protecting people from discrimination and respecting their privacy.
Because FICO started in the world of credit decisions, we have always worked in a highly regulated environment, with tight controls on what data can be used and how. That’s given us a corporate culture acutely aware of the importance of privacy in all that we do. We also work in areas such as retail marketing, where there are far fewer restrictions and different customer attitudes. A person might not mind getting an offer for a product they don’t want, but people certainly will mind if they don’t get a mortgage or car loan at the best possible terms. As Big Data analytics is used in more and more decisions by more and more businesses, finding the right balance between business benefit, personal benefit and personal privacy will take time to sort out. This is one of the critical issues of our time.
Andrew Jennings is FICO's chief analytics officer and head of FICO Labs. He has held a number of leadership positions at FICO since joining the company in 1994, including a stint as director of FICO's European operations. Before joining FICO, Andrew served as head of unsecured credit risk for Abbey National plc, where he introduced account behavior scoring and mortgage scoring. He served as a lecturer in economics and econometrics at U. of Nottingham and has a Ph.D. in economics.
He blogs at ficolabsblog.fico.com.