The (Not So) New Data Scientist Venn Diagram

This post outlines a (relatively) new(er) Data Science-related Venn diagram, giving an update to Conway's classic, and providing further fuel for flame wars and heated disagreement.

I thought Drew Conway's early attempt at defining data science by Venn diagram was formidable, if not ultimately flawed due mostly to the passage of time, along with a shifting perception of what data science actually is. It used to be a great way to visually attempt to explain to people what it is you did or aspired to do, when, after attempting a considered and well-constructed verbal explanation, you got only a blank stare in return. Or some retort like, "Oh, you do Big Data?" Or the wildly more elementary (and incredibly frustrating), "Do you work with computers? You know, my screen isn't working. Mind have a look-see?"

I was never a fan of the inclusion of "Hacking Skills" in the diagram for some reason, since that seems more a skills of data scientists (practitioners) as opposed to the field of data science itself. Semantics, perhaps. Of course, that's just my opinion, and a minor criticism of what clearly served a purpose for a particular period of time.

But data science has changed, as have the expectations of what a data scientist is and does. There is no consensus on what makes up a data scientist - we won't even consider that here - and unicorns are best left in the bedtime stories I tell my daughter. But the fact that data science is still (in perpetuum?) an unsettled concept does not mean that we shouldn't try to narrow it down, and be open to attempts of others to do so. This is where Stephan Kolassa's new Data Scientist Venn Diagram comes in.

OK, so while it's not exactly new, it is new to me (by way of Gil Press).

Data Scientist Venn Diagram

Here is the relevant quote from Kolassa's introduction to the diagram when he unveiled it on the Data Science Stack Exchange forums last fall:

I still think that Hacking Skills, Math & Statistics Knowledge and Substantive Expertise (shortened to "Programming", "Statistics" and "Business" for legibility) are important... but I think that the role of Communication is important, too. All the insights you derive by leveraging your hacking, stats and business expertise won't make a bit of a difference unless you can communicate them to people who may not have that unique blend of knowledge. You may need to explain your statistical insights to a business manager who needs to be convinced to spend money or change processes. Or to a programmer who doesn't think statistically.

More complex than "the original?" Definitely. Rich in detail? Certainly. Susceptible to flaming? Absolutely.

In fact, Kolassa, himself, understands this:

I have labeled the areas in ways that should guarantee maximum flaming, while being easy to remember.

The author's sense of humor aside, while an undertaking such as this is bound to catch some heat and get people talking, it is useful to have a look at, especially when comparing your view of what a data scientist is to that of others. Is this a necessary undertaking? Absolutely not. But it's fun. And we all do it.

I'm sure you look at this and get sweaty right away, and can't wait to voice your opinion. A few things come to my mind immediately, including:

  • The Salesperson???
  • The Perfect Data Scientist seems a few steps too close to the Head of IT for my liking :)
  • The Good Consultant

But instead of blabbering ad nauseam myself, let's have a look at a few opinions that others have had when encountering what I will call The (Not So) New Data Scientist Venn Diagram.

A comment on the original Stack Exchange thread by user Robert de Graaf offers the following relatively tame observation:

I think this is a big improvement on the original Conway version, although I can't quite get past the notion - implied by the size of the overlap - that a Statistics Prof is someone with equal skills in statistics and communication.

El Brown, on the blog Unicorn Whispering ("Attempting to make the mythical world of data science accessible to mere mortals like me"), has this to say while indirectly referencing this particular Venn diagram:

If you believe that you do need the specialist skill of a data scientist, don’t get too hung up on trying to find one that has every single skill you think you need or have been told that data scientists have. There is much to be said for creating diverse teams that collectively have the requisite skills, knowledge and experience — a sort of crowdsourced data scientist.

Chris Moffit, on the website Practical Business Python, provides a supportive contribution:

My experience with this cross-section of people reinforced my belief that the “perfect data scientist” does lie at the intersection of these multiple functions.

Finally, an unknown author, either having their English translated into what I believe to be Chinese or having their Chinese being translated into English, offers some comic relief:

The perfect data scientist from Kolassa’s Venn diagram is a mythical sexy unicorn ninja rockstar who can transform a business just by thinking about its problems.

Life goals, folks.

A Google image search for "mythical sexy unicorn ninja rockstar" did not provide anything worthy (or appropriate) of posting here. Of note, however, and flattering for him, an image of data scientist Yanir Seroussi was among the top results.

This has all been (mostly) in good fun. Respect to Stephan Kolassa for trying to make data science skills easily visualizable by updating a now-classic diagram we all know and love. Or hate. Or are indifferent about. Meh.

And respect to everyone who has taken time to comment on his work, since they all know, I'm certain, that it adds to a growing body of review data on the data science profession, which we can all analyze until we are blue in the face. Which makes me wonder if Marathon Analytics Skills doesn't deserve to be on a new iteration of these diagrams. I should get working on one right away...