They Want to Get Rid of Me! (Data Scientist Lament)
We examine “citizen” data scientists and debate between Jeffersonians, who seek to empower everyday worker with data science tools, and Platonists who argue that democratizing data science leads to anarchy and overfitting.
By William Giovinazzo, Meditations on BI and Data Science.
They want to get rid of me. Wait, let me rephrase that, they want to get rid of us!! I can’t blame them, if I were them I would want to get rid of us too. Let me explain…
Our friends at Gartner started all this talk about citizen data scientists. The concept here is quite simple, as opposed to a data scientist, a citizen data scientist is someone who isn’t a statistician or has specialized skills, but an average user that builds predictive and prescriptive models. In essence, this is the democratization of data science.
There is, however, a debate between two groups; Jeffersonians and Platonists. The Jeffersonians argue on behalf of the yeoman farmer, or in this case the yeoman scientist. They seek to empower the everyday worker with data science tools. Being deeply committed to the republicanism of science they are opposed to an aristocracy of knowledge, to an elitism of hackers, subject matter experts, and statisticians. From their perspective, there are two keys benefits. First, individual workers are freed to do their own discovery. This results in better decision making since they are getting into the details of the model and the data, developing a deeper understanding of what is driving their decisions. Second, data science is costly. In addition to software and systems, data scientists are expensive. Forbes reported that “unicorn data scientists”, those special people who are subject matter experts with hacking and statistical skills, can earn $240,000 annually. As tools become easier to use and the people who use them become more available, the cost of data science will go down.
The Platonist, however, see democracy leading to anarchy, dispensing a sort of equality to equals and unequals alike. By this I mean that there are some serious issues that result from handing these tools to the uninitiated. It is not a simple thing to build a model. Does a marketing manager understand how to detect overfitting? Do they even understand what overfitting is? What model is best applied to the issue at hand? As Gregory Piatetsky points out; “[i]n the age of Big Data, it is very easy to find spurious correlations and come to wrong conclusions […] Of course, many types of analysis don’t require Ph.D. level Data Scientist, and can be done by more junior people with less training. But such people are not “citizen” Data Scientists. Such people need to combine domain knowledge and sufficient Data Science/Statistical training, possibly with certification”.
The Jeffersonians and the Platonists are both correct. In order for data science to move from the visionary stage to the pragmatic it needs to expand past the specialist into the work habits of the average worker. Some may object, saying that data science has already crossed the chasm into the pragmatic stage, but I would disagree. The market is still developing and growing. While many organizations are interested in data science, it has not taken deep enough root in business practices. To achieve this level of acceptance data science must evolve past the challenges that are described by the Platonists.
The question becomes, how then is data science to evolve to make this possible. First, just as we have seen with BI, the tools themselves need to evolve. As we all know, to be a data scientist you need to have hacking skills, subject matter expertise, and an understanding of statistics. We need to get past this, we need tools that will enable users who may not have these skills to effectively integrate these analyses into their decision-making process. This will require that the tools become more automated, not necessarily removing the user from the process, but augmenting their expertise.
Second, data scientists need to be mentors to business users. Rather than performing the data science themselves, they need to train and guide the users in becoming self-sufficient. Organizations can establish a formal training program for users who want to become citizen data scientist. At the end, users receive a certification that allows them access to the tools. As they gain more experience and knowledge they can in turn mentor other business users.
Finally, the citizen data scientists themselves will have to evolve. The first data warehouse I developed was for a set C-level execs that never logged onto a system. Their assistants would print up reports and give them hardcopy. Users have since evolved. The same must be true of data scientists. In addition to training the business users that are currently in place, we need to develop the next generation. We will need to include training in business schools, to bring up the next generation of business people so that they are comfortable with data science. Most business schools require classes in statistics. We need to build on this foundation.
We need to remember that the evolutionary process is something that occurs over time. Some of these things can become a reality today, while others require that we work on them over time. Eventually, however, we will get to a place where data science is integrated into our business processes. As citizen data scientists apply the technology to their daily tasks, data scientists will go on to push data science into new frontiers.
Heck, maybe they won’t try to get rid of us after all.
Original. Reposted with permission.
Bio: William Giovinazzo was in Business Intelligence / Data Science business for over twenty years, and has implemented systems both large and small for companies varying in size from international corporations to mom and pop startups. He blogs at meditationsonbianddatascience.com.
- Citizen Data Scientist, Jumbo Shrimp, and Other Descriptions That Make No Sense
- Top mistakes data scientists make when dealing with business people
- What makes a great data scientist?
Top Stories Past 30 Days