3 Key Ethics Principles for Big Data and Data Science

If ethics in general are important, should ethics training be a crucial element of the data science field?

By Jay Taylor.

Big data is everywhere in the contemporary world. Data is collected thru almost everything we do. From smartphones farming our information, to websites scraping where we frequent for advertising angles, to social media on all levels, our digital footprints are everywhere.

Within any modern industry the idea of ethics quickly comes into play. As technology continues to accelerate at an exponential rate, so does the rate at which data is collected. Think about the drastic differences taking place in regards to electronic payment for example. Due to NFC chips in payment cards and phones the rapid access to buying things only continues to expand. What used to take minutes, now only takes seconds.

So what about ethics? What’s keeping the big business side of analytically driven companies in check? How do we know if this data will be used with our best interests in mind?

Ethics Training in Data Science

So, if ethics in general are important, should ethics training be a crucial element of the data science field?

In a previous KDnuggets post, this question was explored. In a poll of 324 people from all around the world, the answers was very clear. The vast majority of voters (76%) stated that yes, ethics training should be included within the niche of data science. Only 17% disagreed with ethics training and 7% were unsure.

KDnuggets poll results

Ethics Training Poll by KDnuggets

Both the Certified Analytics Professional (CAP) and the United Nations Statistics Division have released official codes and declarations of ethics. The purpose of these guidelines are to clarify crucially important ethical requirements that set standards, help in deterring deceitful behavior, and keep individuals and organizations accountable for the ways they collect and use data-driven information.

Here we summarize the key ethical guidelines for Data Science and Big Data base on recent expert proposals.

1. Collect Minimal Data, Aggregate What’s There

If companies want to protect their users and data they need to be sure to only collect what’s truly necessary. An abundance of data doesn’t necessarily mean that there is an abundance of useable data. Keeping data collection concise and deliberate is key. Relevant data must be held in high regard in order to protect privacy.

It’s also important to keep data aggregated in order to protect privacy and instill transparency. Algorithms are currently being used for everything from machine thinking and autonomous cars, to data science and predictive analytics. The algorithms used for data collection allow companies to see very specific patterns and behavior in consumers all while keeping their identities safe.

An article on Forbes titled The Ethics Of Big Data touches on the subject of aggregating data. Hui Xiong who is an associate professor of management science and information systems states:

“One way companies can harness this power while heeding privacy worries is to aggregate their data...if the data shows 50 people following a particular shopping pattern, stop there and act on that data rather than mining further and potentially exposing individual behavior.

Things are getting very interesting...Google, Facebook, Amazon, and Microsoft take the most private information and also have the most responsibility. Because they understand data so well, companies like Google typically have the strongest parameters in place for analyzing and protecting the data they collect.”

Collect minimal data

2. Identify and Scrub Sensitive Data

Employees in the information science field must comprehend what data is personal and delicate and identify the ways to utilize such information. When information on consumers is collected without consent, it must be scrubbed of insight that can cause the figures to be personally identifiable.

An article titled Five Ways to ‘Exploit’ Big Data Without Compromising Privacy highlights the following:

“Running afoul of regulations can lead to fines, reputational repercussions and the loss of customers. But there are ways to minimize the risk while taking advantage of the opportunities Big Data offers...organizations need to implement a data privacy solution that prevents breaches and enforces security, helping enterprises to:

  • Identify all sensitive data.
  • Ensure that sensitive data are identified and secured.
  • Demonstrate compliance with all applicable laws and regulations.
  • Proactively monitor the data and IT environment.
  • React and respond faster to data or privacy breaches with incident management.”

Scrub sensitive data

3. Have a Plan Set in Motion in Case Your Insight Backfires

Whether you realize it or not, every time you step into a store and make a purchase some form of information is collected about your trip to the store. A few years ago, the retail giant Target broke through the typical levels of customer-based tracking.

They developed a method based on 25 items that, when bought together, usually indicated that a customer was pregnant. This type of customer awareness is great for understanding habits of shoppers and for deciding which promotions and coupons to send out. But it lacked an important filter.

This process backfired as a confused Minneapolis man stormed into a target clutching specific coupons that had been sent to his teenage daughter. A Business Insider piece elaborates on the dialogue that ensued:

“My daughter got this in the mail!” the man said to a manager. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again.

On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”

The reaction by the Target manager was appropriate and thus the scenario was alleviated. But situations such as the events caused by Targets analytics paint a picture of big data that must include an immense attention to detail.

Have a plan

Images courtesy of Villanova University Online

Data sways pretty much everything. There are practicalities corollary to industries centered around anonymous information such as healthcare. The criminal justice industry considers big data to be one of the top tech tools of the trade. Even professional coaches and athletes are honing their focus and using figure-based wearable tech to optimize performance and decrease injuries.

We must be careful to keep our information safe, and the organizations behind data science have a duty to keep to a set or code of ethics.

Bio: Jay Taylor is student and aspiring musician from the Northwest. He is most passionate about the environment, technology, music theory, and the well-being of others.