KDnuggets Home » News » 2016 » Apr » Opinions, Interviews, Reports » Building effective “Citizens Data Scientist” teams ( 16:n16 )

Building effective “Citizens Data Scientist” teams

The idea of citizen data scientists is being for more than a year, which suggests businesses to put the people from the business side in the work of exploring and analyzing data. Understand how you and your organisation can be benefitted by this.

Two out of three will fail

The previous section is definitively a tale of success, but I purposely left some details out. The detail is that typically the members of the starting team in these engagement did not completely overlap with the members of the final team. The truth is that not everyone is fit to turn into a citizen data scientist, actually most business analyst and IT people are not likely to succeed in this role, but some will and they can get really good at it.

It is hard to say what are the exact ingredients that make people fail or succeed, but some are obvious:

  • Analytical mindset
  • Computer skills
  • Ability to work in a team
  • Willingness to learn
  • A quantitative educational background

Moving to the private sector

In recent years we had the opportunity to repeat our experience of mentoring home-grown “citizen data scientist” teams in the private sector. We found a few differences and many commonalities, but our approach centered around collaborative goal-driven projects remained unchanged.

In the private sector the actors, the candidate citizen data scientists, tended to have a more “technical” background compared to their government sector peers. In many cases they were previously in a Business Intelligence analyst or database developer role, or a similar type of IT function. They understood the business side very well having worked closely with business stakeholders in past projects. Their “statistical background”, was not deeper than what we found in the public sector, but in general, their data skills were more advanced and learning the tools of the trade was less of a challenge.

Centralized, Distributed or Embedded Team?

This is a very important decision for leaders planning to elevate a cadre of business people or IT analysts to the role of citizen data scientists. Should they consider creating a centralized “center of excellence” (CoE) in advanced analytics serving the various parts of the business? Or, should people be provided the necessary training and skills and then sent back to their functional area to evangelize the use of analytics and lead specific projects? Or, what about a “virtual team”, where everyone continues to belong to their home department, but also spends part of their time with their peers from other functional areas, possibly working on cross-functional projects?

The centralized approach is the most common in government, where the CoE actually tends to reside on the business side of the house, not IT. However, we are also working with a large agency which instead is pursuing the “embedded” approach, with a very limited part of their citizen data science team residing in a central technical functional area (Data Warehousing). Personally, I have found this scenario, the centralized CoE, to work best over the long term. The team is focused on their mission, has time to the learn tools and methodology and to work exclusively on data science projects.

With the distributed approach citizen data scientist are trained on tools, mentored on hands-on projects, and sent back to their daily job armed with new tools knowledge and a good understanding of how analytical methods can be applied in practice. The problem, in my opinion, is indeed their “day job”. It is hard enough to learn the skills of a “data scientist apprentice”, to figure out what “chi-squared” is all about and why models tend to over fit, doing it part-time only makes the transition that much harder. In some situations, this may work, for example, if the citizen data scientist is immediately involved in an actual project within his or her organization. But without a specific mandate it is probable that the newly learned skills are going to soon be forgotten.

Selecting the rights tools

Something that I find conflicting in the recent trends in data science is that the most popular tools and analytic frameworks do not seem suited for the emerging figure of the Citizen Data Scientist. On one hand the claim is that tools are becoming more accessible to everyone, enabling the citizen data scientist to pursue sophisticated predictive analytics projects, but on the other the de facto toolbox for the modern data scientist seem to be R, Hadoop, MapReduce, Hive and Pig…all very programmatic tools. Thus, it is unclear how people with lightweight technical skills are supposed to get up to speed with data science capabilities when these tools require programming skills typical of a Computer Science graduate. I do not believe these are the type of tools that can work for the citizen data scientist.

Data scientists are not perfect either

I would not be writing this article to diminish the status of the data superhero and Master of the Universe that those of us who claim to be “true” Data Scientists (not mere citizens) have enjoyed over the past few years. Indeed, I believe that any organization that truly wants to become more data-driven (and who does not these days?) and elevate their best people to the “citizen data scientist” role should consider having a few experienced data scientists in the organization to mentor and guide the process.

That said, let’s be honest. Sometimes even people with a fantastic pedigree in machine learning, statistics and computer science get too concerned with the technicalities of the process and too distant from the meaning of the data and the business objective they are trying to reach. Experience matters, and there is no doubt that many data scientists have developed an innate taste for data and patterns, and they can quickly discern between real patterns and data processing aberrations. A neophyte may not see “through” the results of a black-box algorithm, but sometimes even a person with the right background can overlook important details.

In conclusion, whether you like or dislike the term “citizen data scientist”, there is evidence that advanced analytics can broaden its reach simply by enabling more people to use it. I believe that as long as data science is restricted to the still relatively small circle of qualified people who can really understand the process and the tools, we will continue to go through cycles of hype and bust for Data Mining, Analytics, Big Data, or however you prefer to call the basic idea of using data intelligently to drive decisions. We really need this technology to become available to more people in the business world.  Perhaps citizen data scientists are just those people.

[Note from the author: if you are interested in this topic, please read the full-length version of the article on our website and post your comments on LinkedIn if you would like to see additional follow-on articles on this]



Sign Up