KDnuggets Home » News » 2016 » Apr » Opinions, Interviews, Reports » Building effective “Citizens Data Scientist” teams ( 16:n16 )

Building effective “Citizens Data Scientist” teams

The idea of citizen data scientists is being for more than a year, which suggests businesses to put the people from the business side in the work of exploring and analyzing data. Understand how you and your organisation can be benefitted by this.

By Daniele Micci-Barreca, Elite Analytics.

[This is an excerpt of the full-length article, which you can find on our website]

Unless you are allergic to every new hype word coming out of Gartner and other high profile research firms, you have probably heard the term “Citizen Data Scientist” a few times over the past 12 months. It is attributed to Gartner’s director and BI industry expert Alexander Linden, who suggested that companies interested in data science should be:

cultivating “citizen data scientists”—people on the business side that may have some data skills, possibly from a math or even social science degree—and putting them to work exploring and analyzing data.

This is an interesting concept indeed, which has been received with mixed reviews. While a recent article Forbes called it the “Democratization of Big Data”, on more technical forums it has been referred as a “mirage“. Not everyone agrees that its possible, wise and generally a good idea to take people that are not “qualified” to crunch numbers by training and handing them tools with intimidating names such as Support Vector Machines, Decision Trees, Neural Networks and Principal Components Analysis. Most importantly, is it a good idea to entrust such “citizens” with decisions that only until know were strictly the responsibility of Ph.D.-bearing, Data Masters that are as hard to find and expensive as truffles?

Some organizations have simply no options

The main reason for having had several opportunities to help build teams of citizen data scientist is probably the vertical where we started with 13 years ago: government, and specifically tax and revenue.

Anyone would agree that a tax agency has quite a bit of data to crunch and the important mandate to ensure compliance, which advanced analytics can definitively help with. Small incremental improvements in detecting fraud, noncompliance and managing tax collection can easily turn in to millions, if not billions of dollars, in additional revenue to support our communities. However, tax agencies, like any government institutions, are not typically on the short list of top university graduates with advanced degrees in statistics, machine learning or computer science. While I can ensure you that the work would be challenging, on the pay scale it’s hard for many government agencies to compete with Silicon Valley or Wall Street.

Learning by doing is the key

Over a decade ago now, one of our first clients, a national tax agency, decided to assemble their first “Data Mining” team to deploy predictive analytics for tax collection and filing enforcement (the process to identify and “qualify” people or businesses that failed to file). The ranks of candidates included former collectors, auditors, business analysts and some IT folks with prior Business Intelligence (BI) and database skills. Education ranged from degrees in education or social sciences to MBAs to computer science. None of them had done significant coursework in statistics or math. Thus, our task was to turn the “improbable army” of candidate data miners (the term was still in vogue at the time), into an effective team capable of designing, evaluating and deploying predictive models.

Armed with a stack of licenses for a visual-programming-oriented statistics tool and plenty of data in our newly created data mart, we ventured with our team of enthusiastic yet somewhat frighten data miner recruits into a series of actual modeling projects. We started with real goals, we worked hard side-by-side, letting them “drive” and serving as dedicated co-pilots. We explained what needed to be explained at the right time

Over the course of the following decade we engaged with many other government agencies and private sector clients. In some cases, our role was primarily that of being the designer and developers of various predictive modeling artifacts, which we then transferred to the client. In doing that we were often asked to provide knowledge transfer to one of more people charged with maintenance and possibly enhancements of the models. In many of those situations, I must say, the experiment of creating “citizen data scientists” mostly failed. Not “doing”, makes a difference, and also makes a difference when your job is being “a part-time” data scientist.