Should Data Science Include Ethics Training?


Yes (246) 75.9%
Not Sure (23) 7.1%
No (55) 17.0%

This poll was based on 324 voters.

Regional distribution shows that large majorities of Data Scientists support ethical training, with somewhat larger opposition in Europe and in Africa/Middle East.

Region (%Voters)% Yes% Not sure% No
USA/Canada (47%) 82% 5.3% 13%
Europe (31%) 64% 9.9% 26%
Asia (12%) 90% 5% 5%
Australia/NZ (3.4%) 73% 9.1% 18%
Africa/Middle East (3.4%) 45% 9.1% 45%
Latin America (3.1%) 80% 10% 10%


Symeon, Data Science and ethics
What is ethical and what is not??
in my opinion what is ethical is in line with time and place. Things that are ethical now and here weren't ethical in an other period and maybe are not ethical in other places of the world. On the other hand science is the passion for pure knowledge. Science has nothing to do with ethics. Science is the truth, a pure light which doesn't care about what humans call ethical or unethical. Scientist are looking or better they are called by this light. The real question is should scientist share this truth with the common people...?

Gregory Piatetsky, Editor, Statistician code of ethics
interesting UN document, but it is dates from 1985 and very long. I wonder how many statisticians have actually read it. I think a code of ethics should be short and easy to remember to be effective. Think "10 commandments".

Fernando Reis, Official statistics code of conduct
Official statisticians have a code of conduct. See here a declaration of professional ethics by the United Nations Statistics Division.

Scott Nestler, Code of Analytics for Analytics Professionals
The Certified Analytics Professional (CAP) certification includes a Code of Ethics. You can read it (and see links to some of the references that were used in developing it) at: www.certifiedanalytics.org/ethics.php

CARLO FANARA, ethics in data science
I voted 'yes' to this poll as a precautionary measure: in essence, general "ethics" in science can be a very slippery territory. In my opinion there are a few points that need debate.
"Ethics" should be a general principle in life and (any) profession and as such, taught in any type of curriculum. And so one would be inclined to think, that this is a matter for citizens not data scientists specifically.

While the poll question formulation points out some extreme clear cut cases, equally implying in favour (yes) or in opposition (no), the reality might be quite more blurred.
So yes, highlight clearly to practitioners (and wanna be ones) the extreme consequences that the use of data science can lead to, with respect to invasion of individual rights to privacy (it is wrong to propose women special diaper for pregnancy as done recently by a major supermarket chain) and personal safety.
It is not always obvious what the use of the data will be, especially if different bits are fished by different operators and/or subcontractors and then put together by others. Recent enquiries here in France highlighted how even well respected public institutions and international charities (!) ignored laws and regulations, and that the control agency was unaware of (Basically agencies were mining and then selling data to major private corporations). This is unethical. Because this happens already and such activities are typically and purposely fragmented to avoid their discovery, it is possible that single individuals do not perceive the full implications. Thus, again, yes, teach ethics in the data context. But to make it practically relevant, legal principles in different countries must be clearly spell out.
I envisage a new speciality in the data science curriculum, like "Data legal expert" because the matter is complex and individual countries legal systems lag behind with respect to the global nature of the phenomenon.

Where would the borderline be? Perhaps by Introducing the notion of "emergency" (life threatening) situation) as the sole reason to trespass those privacy threshold without the owner consent.
Finally, "ownership of the data" could be technically tricky. Perhaps an owner should be the initiator node of a "data lead" or dynamic data process but then one has to find a feasible way of tagging such ownership in a network like structure and then follow (and legally check!) all the many possible leads (how would you do this in a data streaming scenario?). So we see that there is quite some technical content in this curriculum. So again, my answer is yes.

