A Data Science Code of Professional Conduct can protect both consumers of data science and data scientists themselves. But it is useful and possible without a single professional body? Read the pro and con arguments and join lively debate on this topic.
On April 10, 2013
Gregory Piatetsky-Shapiro
(KDnuggets),
Eric Siegel
(
Predictive Analytics World) and
Michael Walker (Rose Business Technologies)
discussed on a Google Hangout
whether data science should
be an independent profession with a code of professional conduct and
self-regulation.
Regulation of data science is under consideration (read
here
and
here)
and Michael Walker
argued that either data science becomes a profession and regulates itself or Congress
will impose draconian regulations that defeat the purpose of data science:
to make life, business and government better. He has drafted a
"Data Science Code of Professional Conduct".
Michael made these arguments in support of data science as a profession:
1) Data science is in the early stage and needs to develop a "Canon" (a body of principles, rules, standards, or norms) of
scientific methods, principles and best practices for practitioners. Data
science incorporates and overlaps several disciplines (data mining, statistics,
machine learning, cloud/high-performance computing, databases, visualization),
is wide open for innovation , and requires guidance to ensure data science is
used to make life, business and government better, and prevent abuse.
Ninety percent (90%) of the world's data has been produced in the past two
years and will grow exponentially. How we extract meaning from all this data
without creating "an illusion of reality" is important.
2) To protect both consumers of data science and data scientists
from charlatans, illegal and unethical conduct and data science
malpractice. A Data Science Code of Professional Conduct is needed to protect
individuals privacy, clients confidential data, prevent conflicts of interest
and to ensure data scientists have a duty to the greater good of society, and
not just blind loyalty to the client.
3) Self-regulation versus imposed regulation. Either data
science becomes a profession and regulates itself or congress will impose both
good and bad regulations. It is better for data scientists to architect and
implement a regulatory scheme than to trust congress to enact an appropriate
regulatory structure that may defeat or limit the development of data science.
4) To create a check and balance against big government and big business using data science at the expense of the majority in society.
Some argue that the internet, mobile smart-phones and computers are a big
spying machine that big government and business uses to collect information on
people further eroding civil liberties. The potential for abuse is significant
and the professionalization of data science can mitigate harms.
Reasons to oppose data science becoming a profession include:
1) Professions tend to create artificial barriers to entry causing artificially higher prices.
2) Professions tend to be self-serving at the expense of consumers.
3) Professions - after a period of time - tend to stifle innovation to protect vested interests.
Michael Walker argued that - on balance - the equities favor
data science becoming a profession. He pointed out that in many disciplines
like medical research, economics and psychology, data manipulation is common
and the scientific method has not been honored resulting in decreased
reputation and the eroding trust of society. Future data scientists need to
preempt this outcome by not only honoring the traditional scientific method, but
by developing new data science "canons" and scientific methods to
liberate meaning from data without creating an illusion of reality.
Eric Siegel is agnostic about whether data science needs to
become a profession. Mr. Siegel agreed that data science can be abused -
that a code of professional conduct may be useful and stated that a
certification to establish a base level of competency may be prudent. He voiced
concern over the civil liberties aspect of the use and potential abuse of data.
Gregory Piatetsky-Shapiro argued against data science becoming a
profession. He asserted that other established organizations - like ACM
(computing professionals) - is considering
The Pledge of the Computing Professional
which touches upon many themes relevant to Data Science - and also pointed out
that INFORMS has
Analytics Certification programs
He thinks these organizations will be adequate to develop data science.
Gregory asserted that while a code of professional
conduct is a noble goal, it is meaningless without a central
organization that promotes and enforces this goal, and currently data science
is such a diverse field that central organization is very unlikely. Just
looking at current Data Sceince related meetings on
www.kdnuggets.com/meetings/
page, we see meetings sponsored by research societies like ACM, IEEE, INFORMS,
SIAM, commercial companies like O'Reilly, GigaOM, IEG, Big Data Companies like
IBM, SAS, EMC, and many others. It looks very unlikely that all these diverse
interests will agree to a single organization to enforce any code of conduct. This
view was shared by the majority of data scientist who took part in a recent
KDnuggets Poll (March 2013)
a were against a Data Science pledge
Michael responded that data science is a new field that
encompasses a variety of skill sets from different disciplines and desperately
requires a professional body to develop canons that incorporate and blend
scientific methods from a myriad of disciplines. The blend of scientific
methods will create something new and relying on the scientific methods of
math, statistics, computer engineering and others - alone - is not sufficient.
Data science requires its own professional canons.
Michael also asserted that - while a majority of data scientists
may not at this time favor a "pledge" - a large majority of data
science consumers would likely favor hiring a data scientist who is certified
and is required to honor a code of professional conduct - similar to certified
public accountants, lawyers and physicians. Considering the significant damage
data science malpractice can cause, Walker speculated that the market would
favor certified, professionalized data scientists. Moreover, a professional
code can protect data scientists from unethical and illegal client conduct.
Mr. Walker suggested that we should learn from other professions
like law and medicine - adopt the good and remove the bad to mitigate the
negatives of a profession. To earn and maintain trust and credibility, data
science must follow traditional scientific methods, innovate new methods and
follow a code of professional conduct.
Comments from around the web:
In
Next Gen Market Research (NGMR) - The Best MR Networking Group on the Web!
Tom Anderson
Why is everyone so gaga for regulations??
Gregory Piatetsky-Shapiro
Most data scientists in KDnuggets Poll are against
regulation, but the question to be debated is whether government will impose
some regulation. I doubt it. However, there are professional societies that
create certifications. But if data scientists are in big demand and get
increasing power, what is the responsibility that comes with that power?
Tom Anderson
ZZZzzzzz
Makes perfect sense that most are against. Useless certifications are usually
only a benefit to those who charge dues for them.
What is your stance, for or against?
This very same question came up during the panels I
participated in last year at both the Text Analytics Summit and Text Analytics
World events. There too consensus among both panelists and attendees seemed to
be that standardization etc. were a bad idea.
Gregory Piatetsky-Shapiro
I can see a lot of demand for technical, professional
certificates, to recognize an achievement in education. INFORMS offers CAP,
many analytics outlets offer certificates:
www.kdnuggets.com/education/analytics-data-mining-certificates.html
I don't see any demand for a "non-technical" pledge or code of conduct.
Tom Anderson
The certs are probably ok for folks
early on in their careers. Don't think there would be much demand among skilled
practitioners who can point to the work they have already been doing.
Anyone with enough experience or a degree in associated
fields would probably not go for it. The mere fact that those who are best in
the field would not seek it out would serve to drive down the value of the cert.
INFORMS is a very interesting org which might have
the credibility needed.
But if MR industry gives any indication of what
would happen it would be that many different trade orgs and trainers of all
types would all serve to further drive down value of any cert. Why not just say
on your resume (assuming you needed it) that you have taken X Y and Z training
courses/seminars. Whether cert or not, and employer worth their salt would need
to confirm knowledge skill later anyway.
From
Research Methods and Analytics
chris jensen
My belief is that data should be open to all, there so much hidden data and
probably will always will be hidden data, such a shame for the human race...
Gene Shackman
I don't know if it SHOULD. I know that it won't.
"data science" is too vague, too many different people with different
backgrounds do it, there isn't any universally accepted definition, there is
too much money involved so that too many people will want in, etc. So it's not
likely to happen.
What is it that's really behind this question? I
suspect that there is worry that too many untrained people do it, without
really knowing all they should know. So, instead of regulations, we should
encourage everyone involved to get training, and educate the public about
statistics and data analysis.
Mark Biernbaum, PhD
I agree with Gene, in that this question is probably
prompted by the droves of people doing data that have no real idea what they're
doing (like in business, for example, and in particular, in marketing). I know
that my sister got an MBA several years ago and was required to take 1 course
in data, and that was it, regardless of the fact that her emphasis is
marketing, which uses tons of data. I also know big-time data miners who could
not tell you what the assumptions were that govern ANOVA testing - they live by
the law of large numbers and don't think any of that applies to them.
Any credentialing program set up now would have to
grandfather in thousands upon thousands upon thousands of individuals who have
been using data their entire careers. For those being educated now, a
credential option might not be bad - at least insuring they know the basics.
And the credentialing option might help future generations. But right now, so
many would be grandfathered in, it makes little sense. Also gaining agreement
from professionals on what the credential should contain could be totally
impossible, given the huge array of ways data is used in our society
|