You’re Fired: How to develop and manage a happy data science team
I want to share a solution called Insight-Driven Development (IDD), a few examples of it, and five steps to adopting it. IDD aims to create a high performing, engaged, and happy Data Science teams that embrace non-ML work as much as the fun ML stuff.
By Ian Xiao, Engagement Lead at Dessa
TLDR: Most ML teams don’t like to do data and infrastructure work because it is not as interesting as modelling. Mis-management of the issue can lead to a high turn-over and a toxic team atmosphere. I want to share a solution called Insight-Driven Development (IDD), a few examples of it, and five steps to adopting it. IDD aims to create a high performing, engaged, and happy Data Science teams that embrace non-ML work as much as the fun ML stuff.
Disclaimer: This post is not endorsed or sponsored by any of the firms I work for. I use the term Analytics, Data Science, and ML interchangeably.
“Hey man, it’s not working out. We have to roll you off the project. Sorry.” Alex, a competent Data Scientists, looks at me emotionless. I feel helpless, nervous, and sad.
He’s probably cursing me with the strongest languages. He looks at the pen on the table. Wait, does he want to stab me with it? He’s a big guy, and he looks like Jason Bourne. My underfit Neural Network tells me that I have a pretty low probability of getting to the pen before he does. I should probably leave now while I can.
So, I just “fired” one of the two data scientists off the project. Why? Alex hated doing data work. He’s very vocal about it. He used all kinds of excuses to “offload” it to someone else. We talked about it; he still did a half-a*s job. Other people got frustrated and complained about why only Alex got to do the cool stuff. We, as a team, were going to fail if this continued. So, here we are.
“Sh*t dude. I am sorry.” Alex said, finally. Damn, that’s awkward (well, at least he didn’t stab me with a pen).
The “Alex Problem” is quite common among data science teams. Many data scientists feel bored and unmotivated. I think there are two root causes:
- there is a gap in what Data Science teams want to do vs. what we need to do in the real world
- non-Data Scientists wish to grow the most (perceived) in-demand skillsets
That day, I “managed” the Alex Problem, and I felt sh*t. At that moment, building high performing, engaged, and happy Data Science team became my mission.
At the minimum, I want a system that would encourage the data science team to embrace non-ML work and have some fun doing so. Well, the obvious solution is to force people to do it (we all get paid to work, right?). But it’s not sustainable. So, the solution has to be practical, enjoyable, and self-motivating.
In the following years, I started a journey of trying and mixing core ideas from Product Management, Agile Software Development, and Management Consulting. I think I found the solution: Insight-Driven Development (IDD).
Scope of this Article
Typically, there are two types of ML projects: business-focused and software-based projects. In this article, let’s discuss the principles of IDD in the context of business-focused ML projects. The outputs of these projects are typically executive-level presentations or dashboards. The team needs to use specific ML techniques, from simple to sophisticated ones, to find some insight.
Depends on how this post does, I can deep-dive into 1) how IDD works across the early, mid, and late phases of the project and 2) how to adapt IDD for software-focused ML projects (e.g. the outputs are full-stack solutions that integrate with core operations).
Assumptions about Teams
To utilize IDD, you need either a team of experienced people with unique crafts and are motivated to try new things or a group of junior generalists who are driven, can get stuff done, and moldable. If not, IDD is not for your team (and you should take another look at your hiring strategy).
Let’s Get into It: The IDD Principles
IDD comes down to doing two things differently:
- How to scope work for each team member (e.g. also known as “Work Package,” what everyone needs to deliver).
- How to assign accountability according to people’s strength
First, let’s look at how Work Packages can be defined differently with an illustrative example. Below are two Backlogs without and with IDD for the same project, at the same Sprint, and with the same immediate goals.
How do we typically manage ML projects? In most ML projects, we divide Work Packages into four groups:
- Analytics (I use this as an overarching term that covers exploratory analysis and model development)
- Software (e.g. system integration, infrastructure setup, and CICD)
- UI design and development.
Some projects may have a Business verticle if it’s a management consulting type of engagement.
What’s changed? In IDD, the most apparent change is that there are more “Analytics” items (things in Yellow). Each Analytics item is carefully phrased in a kind of analysis; each analysis aims to drive out interesting insights; each insight contributes to the bigger business problem based on our hypothesis. Hence, this approach has the name of Insight-Driven Development.
What happens to the data and infrastructure work? If we look closer, each Analytics item includes the ETL and Infrastructures in the “Definition of Done.” Data and infrastructure work are still very critical. We are not avoiding them, but doing them as part of the journey to an insight. There are certain caveats to this approach. For example, some data and infrastructure work has to be standalone, especially in the early phase of the project. We will discuss more in a follow-up post.
How do we assign work? It seems like there are a lot more items that are analytics related. It isn’t practical to assign all of them to the Data Scientists. People have different strengths, experiences, and interests. So, this brings us to the second aspect of IDD: accountability assignment.
Typically, the roles define people’s accountability. For example, data engineers own and work on all, and likely only the data-related stuff.
In IDD, each person’s accountability shifts toward outputs from their immediate roles. The team “works around an insight.” Each person owns the final delivery of insights. To keep a balance of quality, each expert sets the standard, leads the programming (if needed), and be the final quality gatekeeper for her domain expertise.
With this in mind, the work assignment will look like this with and without IDD. The main difference is that people can lead work outside of their immediate role and domain with support from others.
Note that particular work still requires highly specialized skills, and it’s best to assign to the experts. You, as the project lead, need to monitor workload and provide enough guidance to people who are leading the work outside of their immediate expertise (e.g. a data engineer creating a customer analysis). Each team member needs to work closely and communicate expectations and timing.
I have been using IDD on both business- and software-focused ML projects with teams of business analysts, data experts, data scientists, and software engineers with varying years of experience. It’s been working well, and I am going to keep using and refining IDD. Concretely, IDD works because it lets everyone to:
- get involved in the problem-solving process (more interesting)
- see how their work directly contribute to the end goal (more engaged)
- understand how to refine their design given more context (more quality)
- get an opportunity to work on new domains (more learning)
- continue to be an expert in their area (same certainty)
Every team at every company works differently. So, please take the principles and apply them accordingly. Do expect some confusion, tension, and uncertainty. Be patient, things will click.
If you like the approach, here are what you can do to adopt IDD at your company.
Step 1. Make sure your teams and friends understand IDD. So, share this article with your friends and teams with the title of “Must Read.” 😉
Step 2. Pick a small ML project that is business insight focused (e.g. use ML to find new customer segment and size the potential). Ideally, it should be something you can complete within 2–4 months with a team of 5 experienced engineers.
Step 3. Assemble your A-team. The core team should be you (as the project lead), a data engineer, a machine learning engineer, a software engineer, and a UI designer. You may need part-time support from a business analyst and IT.
Step 4. Go for a coffee or drink with the team and align on this approach (I find people are more open to new ideas when we are outside of the office).
Step 5. Start the project. Resist the temptation of reverting to the old way of working. Give it some time for the team to learn (and fail). Pay attention to how individuals deliver the IDD.
Here are a few Don’t Dos to save you from some awkward moments:
- Do not pick a large and mission-critical project.
- Do not pick a software-focused ML project (yet)
- Do not introduce IDD to an in-flight project.
- Do not start the project unless everyone is on board.
- Do not force people to lead the insight developments if they don’t like it (you can teach, but can’t force people).
Depends on how this post does, I will follow up to discuss how IDD works for software-focused ML projects (e.g. the outputs are full-stack solutions that integrate with core operations).
You may also like these …
Data Science is Boring
How I cope with the boring days of deploying Machine Learning
We Created a Lazy AI
How to Design and Implement Reinforcement Learning for the Real World
A Doomed Marriage of ML and Agile
Sebastian Thrun, the founder of Udacity, ruined my ML project and wedding
The Last Defense against Another AI Winter
The numbers, five tactical solutions, and a quick survey
The Last-Mile Problem of AI
One Thing Many Data Scientists Don’t Think Enough About
Bio: Ian Xiao is Engagement Lead at Dessa, deploying machine learning at enterprises. He leads business and technical teams to deploy Machine Learning solutions and improve Marketing & Sales for the F100 enterprises.
Original. Reposted with permission.
- Data Science is Boring (Part 1)
- Automated Machine Learning: How do teams work together on an AutoML project?
- How to Make an Agile Team Work for Big Data Analytics