Split on Data Science Skills: Individual vs Team Approach

The results of latest KDnuggets poll show an almost equal split between those who favor individual and those who favor the team approach. See the counterintuitive regional differences and interesting comments.



By Gregory Piatetsky, Jan 21, 2014.

Unicorn Data Scientist Most Data Science positions (and tasks) ask for a combination of diverse skills such as

  • Statistics / Machine Learning,
  • Hacking,
  • Databases,
  • and Industry/Domain knowledge.

There are a few versatile ("unicorn") data scientists that have all the needed skills, but not enough to fill the projected shortage of 140-190,000 people with deep analytical expertise and 1.5 million analytics managers/analysts (just for the US).

The latest KDnuggets Poll asked: Which approach is better?

Data Science tasks need a rare combination of statistics, hacking, database, business, presentation, and other skills, but finding versatile ("unicorn") data scientists with all the required skills is hard. Which approach is better: [304 votes total]

Individual: Seek and Train versatile Data Scientists that have all (or most) of needed skills (135)
Not sure (33)
Team: Build a data science team where each member mainly focuses on one skill (136)

This poll was prompted by a recent post on KDnuggets by Michael Mout What is Wrong with the Definition of Data Science which generated an intense discussion about whether it is possible to find data scientist that have all the needed skills or whether those roles are best filled by a team - see Unicorn Data Scientists vs Data Science Teams.

Michael Mout divided Data Science skills into 3 main areas: Math and Stats, Computing, and Data Bases.

The famous Data Science Venn Diagram proposed by Drew Conway in 2010, has somewhat different components: Hacking Skills, Math & Statistics, and Domain/Business Knowledge.

Part of the difference, as David Gillman commented below is

Big Company needs a team. SMB needs one or two individuals.
Obviously, if SMBs can only afford to hire one person, they will try to get a more versatile person.

Large companies will have a team of Data Scientists. However, even Big Companies which can afford a large team should decide whether to seek and train hard-to-get but versatile people or try to do the tasks by teams of specialists.

The poll results were evenly split between supporters of team and individual approaches, but interesting and counter-intuitive differences emerged in regional analysis.

The respondents from the US, traditionally considered more individualistic (as epitomized by "life, liberty, and pursuit of happiness" in US constitution) were actually more in favor of a team approach, while both European and Asian respondents were for individual approach: Europe. Canadian respondents were quite different from the US and most strongly in favor of the team approach, while Australia/NZ voters, although few in number, were the most individualistic.

Here is the breakdown by regions. Bar height corresponds to the number of respondents.


Region % Individual approach % Not sure % Team approach
US (147)  41%  12%  47%
Europe (76)  51%  12%  37%
Asia (41)  51%  2%  46%
Latin America (12)  42%  17%  42%
Africa and Middle East (12)  33%  8%  58%
Canada (11)  18%  9%  73%
Australia/NZ (5)  80%  20%  0%

Comments

C. Magaret, via Disqus
Let's go back in time about 20 years, when there was a crazy new trend that took the world by storm called the World Wide Web. Some people said it was going to change the world, others said not so much, some said it was moving much too fast, and some cynics declared it as just a passing fad.

Back in the bronze age of the web, the professional and amateur practitioners who brought the web to life were jack-of-all-trade technologists that went by the term "webmaster". To be a proper webmaster, you needed a (then) arcane mix of skills: system administration, network administration, programming, interface design, and graphic design, to name a few. Webmasters typically worked alone, or among a small cadre of webmasters, to accomplish an organization's web-based goals.

Today, the web is ubiquitous. It plays a huge role in almost everyone's life, has become a large part of essentially every organization on the planet, and there are massive numbers of businesses whose sole purpose is to monetize the web. The web has brought in databases, it spurs the development of new technologies, it pushes the envelope for interactive user-interface design, and certain essential services have been commoditized to the point of triviality. The scope and maturity of today's web is so big that having a jack-of-all-trades webmaster is no longer practical. For people working in web-based enterprises, each of those webmastery skills I mentioned above (and more) have been parceled out as discrete disciplines to specialists. While an interface designer might know a thing or two about how SOAP works -- and would be a better interface designer for it -- they're not going to be working on the back-end programming with an API. They'll leave that to the developer.

Simply put, the web has outgrown the webmaster. Sure, you might find an old-school webmaster working for a small organization somewhere, but a proper web effort requires more than just a small team of generalists, for both efficiency and quality of effort. They need to have that team of specialists.

The above comes to mind w/this argument about generalist vs. specialist data scientists. If "data science" as a field is expected to evolve and mature to an industrial scale, then "data science" will outgrow the data scientist, and of course you will see specialization, typically in the form of project teams, and I'm seeing this already in large data-driven organizations. People have natural skill mixes, and they gravitate to those responsibilities that cater to their strengths, and for a large organization practicing in data science, having a project team composed of people with distinct responsibilities will result in a more-efficient workflow and a better end product, and it's easier to manage. It's a more natural and disciplined way of doing business.

Sam Steingold
I don't think either option is realistic. It is very hard to combine all 3 skills (math/stat, hacking, business). I think the best approach is to get an excellent math/stat person and then fix his/her deficiencies by adding people to the team as necessary. I.e., a data team should be built around someone with a strong math/stat background.

Kathryn Buttler, Data Science Team or Individual?
I agree that it's very rare to find the unicorns, but I also believe that you need individuals that are much more rounded than being experts in one domain. When recruiting, I'm looking for individuals who can fill the holes in the team, but if I'm running low on computational skills, I would never hire a programmer without the curiosity and creativity of a miner, or without the ability to pickup business knowledge and interpret the outcomes of their analysis, or the ability to distill their insights into powerful presentations. I think it depends on the size of your team, whether you have the luxury of hiring domain specialists or whether you need to make every FTE count and look for more all-rounders, with the occasional domain expert.

Graham Mullier, option 4 - both!
I voted for 'individual' but really wanted a fourth option, for recruiting and training multi-skilled individuals, augmented by a broader team to cover all aspects in more depth.

That's what I'm doing right now with the new Data Sciences function I'm setting up in Syngenta. I'm lucky enough to have a small team who can operate well as individual data scientists with the right skills and domain knowledge. They are part of a larger group who can handle and store datasets, integrate them, build ontologies, build data visualizations etc. It should help scale their expertise and spread the benefits wider.

David Gillman, Team vs Individual
Big Company needs a team. SMB needs one or two individuals.

Sarcastically - a "Team" is a necessary element in any project where blame must be shared or assigned to others.

George R, Team vs Individual
The team approach is healthier for an organization's sustainable efforts. Finding that true Data Analytics rock start is difficult and expensive. Even if you hire that exceptional individual with the entire skill set, she will likely not stay around. The team approach allows faster access to the various components needed to properly attack data problems. If a team member moves out, it is much easier to find somebody who is really good a that missing component.

Gary Howorth, Individual vs Team
Ah but your assuming that the team talks to each other and "asks the right questions" Only in text books I'm afraid . Been in the data business for 30 years, so speaking from experience. the tools have got better, but thats only 10-20% of the skills required. I answered individual but as the previous commenter said not many of them of around especially those that have the right multi disciplinary skills- its a special skill. so in the absence of that yes you need a team. you might argue for a very small team of 2-3 people But having the skills reside in 10 individuals say doesn't do it for me. Its about the way you pull together various disparate components thats important ie data analysis plus context plus experience in the field

Richard D Avila, Data Science Team or Individual?
I wrote my first "analytic" in April 2000. It was a geo-clustering algorithm on a spherical surface. Did the math, did the coding, ran the data,did the validation but had NO clue to what the results meant against 1.5M records. We had an operations analyst who understood what was being shown and good things followed. Analytical reasoning/domain knowledge/domain context can only be gained by truly working in the said domain. I truly agree that the three key competencies of data analytics are: analysis/domain knowledge, math, computation. To be an effective "data scientist" a single individual needs two of three. Of my time in data analytics (April 2000), I have met only one person that had all three areas that could work on a real world problem. The technical dimensions of this discipline is vast - think of all the mathematics, algorithms, and computational technology expertise that is required.

Where one would also fit domain expertise is beyond me. I would view someone who claims they can do all three as either naive or arrogant. Two traits that can doom a data analytics project or systems development. Just because you can imagine a unicorn does not mean that they can exist in this the physical world that we live in. Data analytics done right almost always requires a team.

Selected comments via LinkedIn

By Bill Winkler, principal researcher at us census bureau
New Poll: Data Science Skills - Individual vs Team Approach?
For the most difficult set of algorithms (that may even involve theoretical/computational breakthroughs), it is difficult to find appropriate skills in one individual, even if the individual is given very substantial time to develop the skills.

Over the years, there have been very many large software development projects and data warehouse creation projects to which we can add certain big data projects. All required certain skills, some of which were unknown in the early parts of the project. A very substantial number of projects (over 50% according to certain IEEE CS and ACM articles) have failed.

Michael Hidiroglou and I were involved with a number of successful teams that developed very large, generalized computer systems. In most projects, there was a point where one individual was able to develop new methods/algorithms that caused the overall effort to be successful. The method might have involved advanced OR (set covering, integer programming) or CS (approximate string comparison, very advanced indexing/search/retrieval) that are not common skills among statisticians, economists, or the type of IT programming personnel involved in these team projects.

"Developing Analytic Programming Capability to Empower the Survey Organization".

A number of groups have subsequently attempted to develop similar systems because the groups were aware of the systems that we had developed (or similar systems developed in The Netherlands, Spain, and Italy). Most projects failed (even after multiple attempts). The common thread was that successful teams had suitable analytic/theoretical/algorithmic skills. In a few situations, IT areas provided supplemental skills but most of the key breakthroughs were provided by statisticians and economists (usually Ph.D.s with very good programming skills).

If there are Big Data situations such as cleaning up and unduplicating sets of national files and then merging sets of national files for subsequent statistical analyses that adjust for linkage error (because unique identifiers are typically not available), then advanced compuational methods are needed.

Winkler, W. E. (2011), "Cleaning and using administrative lists: Methods and fast computational algorithms for record linkage and modeling/editing/imputation," Proceedings of the ESSnet Conference on Data Integration, Madrid, Spain, November 2011 ( www.ine.es/e/essnetdi_ws2011/ppts/Winkler.pdf ).

I do realize that there are a large number of quick projects (1-6 months) where one individual has suitable skills for a particular set of analyses. Two issues: (1) if you have created input files/structures and written software that runs in a week in smaller situations, how do you become aware that it is possible to write software that is 10-100 times faster? and (2) if you are using a statistical or other software package, how do you know that the algorithms are performing the computations correctly?

By Thomas Speidel
A very sensible comment from Bill Winkler. I can relate to the problem (nightmare) of record linkage with administrative datasets.

I would say that the disagreement arises from limited experiences in a new field that is still vaguely defined. It's almost unavoidable. Those who think that specialization is yesterday's approach are probably accustomed to projects for which a superficial knowledge was sufficient. They probably don't realize it, but they too have specialized knowledge.

By Freddy Holwerda
Agile development offers the concept of T-shaped resources. So I chose the individual approach: "Seek and train versatile Data Scientists that have all (or most) of needed skills", adding that each DS should only have a certain depth of each skill (the horizontal bar of the "T") and acquire only one or two skills in real depth (the vertical bar of the "T").

By Laura Squier, Sales Engineering Manager, SAS
See 8 professionals you need to build your analytics dream team,
Blog series shows how to develop an advanced analytics strategy

Professionals to build analytics dream team: SME, Data preparer, Data steward, IT, Analytics, Thought Leader, Champion

By Wil Stanton
As the fields comprising analytics (and therefore data scientists) continues to evolve at what sometimes seems an exponential rate, in my opinion a team approach would be preferred by most companies (although not by all). Thus, bringing together the right DS team to work on a specific project/problem/opportunity. The team of DS may be comprised of individuals with complementary depth in one or two areas and an understanding of the others, so as a team they would have both breadth and depth of knowledge -- which I don't think is achievable in a single DS (although I have met some individuals who believe they have both breadth and depth in all fields, including nascent fields -- not everyone agreed with their self assessment). However, as with most things, the approach depends on many assumptions/realities including complexity of the problem, size of company, availability of talent within the company, etc., etc.

By Dorothy Hewitt-Sanchez
Maybe I am not reading the same job postings that you are reading. These job postings are asking for everything and the kitchen sink. It does not matter what the job title is. This is why recruiters are saying that they cannot find qualified people. It is ridiculous. I have worked in many job roles. When I was a DBA, I still had to do business analysis, gather requirements, rewrite queries for business and data analysts and participate in decision support meetings. It is all part of the job. As a consultant or a supervisor, I still had to do all these roles and more. So, the lines are very blurred when it comes to job roles and titles.

Hmm, now if I throw statistics, machine-learning, and Hadoop in the mix, I wonder.

By Stephane Caraguel
During my career, I met two types of people: the ones who enjoy learning as many skills as they can in many fields (generalists) and the ones who prefer to be experts in one domain (specialists). Until recently, companies have been looking mainly for specialists, but are now discovering they need more generalists: solving complex problems most often requires borrowing methods from several fields (biology, physics, sociology ...). My take on this is that the best teams include generalists and specialists.

Eric King
this is one of several problems I've had from the start with people loosely donning the title of "data scientist." Beyond the various functions that it needs to cover, the vast majority of practitioners don't have the personality to function at an adequate level to manage the technical aspects along with the soft skills to effectively manage the strategic issues. There are a mere handful of very unique people that I believe could fulfill this role... making "data science" a team function rather than a personal title.