Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2021 » Sep » Opinions » Data Science Process Lifecycle ( 21:n38 )

Data Science Process Lifecycle


How would it feel to know that without a doubt, the data projects you were working on would create TRUE ROI for your organization? Stick around until the end to get my data science process lifecycle framework so that each data project you run is a smashing success.



By Lillian Pierson, P.E., Mentor to World-Class Data Leaders and Entrepreneurs, CEO of Data-Mania

Image

If you’re executing data projects, it can feel like a LOT of pressure to make sure that your project goes off without a hitch, and actually turns a profit for your company. 

And with the current industry data project failure rate sitting at 80% (according to Gartner), it’s no wonder that data pros are stressed.

Here’s where I think the data industry is going SO wrong.

 

The problem 

 
 
The data and AI industry is BOOMING. Companies are investing millions into data initiatives - yet as we can see from the statistic above, often not getting a true ROI from what they put in.

Why is this? 

Well, we can strategize all day long - but one major gap I’ve witnessed that I believe could significantly be contributing to poor ROI is the lack of strategic and business knowledge given to data workers. 

You see, as a data implementation worker, you’re probably OBSESSED with the how. You love to get caught up in the nitty-gritty details. You spend your days building complex models, problem solving your code, building technical solutions - and you love it. 

But I’ve got an important question for you. 

Do you always know WHY you’re doing it? 

I used to be a data scientist, too. And while now I spend my days growing  my data startup, Data-Mania from Koh-Samui, Thailand, you could once find me working away building models at my cubicle.

I know just how easy it is to get caught up in the details. 

When you’re SO focused on tech and coding, it can be easy to lose sight of the actual business goal and vision. You might start spinning your wheels, going off on tangents, and overall contributing to business inefficiencies - often without noticing.

Not to mention, having to execute projects without a firm understanding of your place in the company’s vision and without a strategy for forward momentum can be downright frustrating and inefficient. NOBODY likes working for hours upon hours without being able to see the fruits of your labor. 

It’s no surprise that data professionals are feeling lost. In a 2020 data survey I conducted with a small group of leaders and business owners, 87% of business owners reported they had no clear, repeatable processes for leading profitable data projects.

How are data pros supposed to excel without strong leadership and frameworks to guide them in their execution? 

We need to make sure that as data implementation folks, we keep our eyes on the prize. And as leaders, we need to make sure data implementation workers are included in the overarching strategy from the get-go. 

If you’re ready to make sure the data projects you work on always stay on track and profitable, let’s dive into the data science process lifecycle framework. 

 

The Data Science Process Lifecycle - What It Is

 
 
Essentially, the data science process lifecycle is a structure through which you can manage the implementation of your data initiatives.

It allows those who work in data implementation to see where their role first comes into the bigger picture of the project, and ensures there’s a cohesive management structure.

To get us started, I first want to share with you Microsoft’s Azure data science process lifecycle, and then share with you how I think we can improve upon it. 

Microsoft divides their process lifecycle into 4 categories:

  • Business understanding
  • Data acquisition and understanding
  • Modeling
  • Deployment

This framework takes one massive step in solving the problems mentioned above: massive inefficiencies, wayward projects, scope creep, the list goes on.

One whole node of this process - that of business understanding - works to make sure that the tech and data implementation workers understand the business problem, before actually going out and trying to solve it.

However, I think this lifecycle can be improved upon.

I don't think that this process goes far enough to TRULY emphasize the business problem that tech workers are trying to solve. With this framework, only one of the four nodes educates data implementation workers on business acumen.

A quick check-in before diving into a project is a good starting point, but it doesn’t go far enough. 

 

Why you need to go deeper

 
 
Data implementation workers need to have a solid understanding of the big WHY behind a project.

Think about it this way. 

If you’re executing a data project, you’re the one with the eyes on the ground. You’re getting an insider view of the tech and the data solutions that the managers and higher-ups don’t necessarily get.

When you’re fully educated on the business vision and have been given that higher-level insight, you’ll be able to spot things you may have overlooked before. Whether that’s ways that the business problem isn’t being solved as efficiently as it could be, or a strategy to potentially reach a goal sooner, you will begin to filter your work through a new lens.

 

How to improve the data process lifecycle

 
 
My suggestion for taking this framework is to add on a fifth functional unit, and call that unit data strategy

And if you’re thinking to yourself, “HOLD UP. I don’t know a thing about data strategy.”

Do not fear.

I’m definitely not suggesting that data implementation folks should be responsible for coming in and creating a data strategy from scratch. Instead, this would be more of a top-down approach. 

This unit would ensure that all of the tech and data implementation team have a deep understanding of the business problem, goals and vision - so that they can make sure that the work they are carrying out on a daily basis is advancing that vision.

This unit would also act as a sanity check.

As someone who’s worked as a licensed professional engineer for many years, I know just how easy it is to get caught up in the weeds of a project. You need to have a framework to return to in order to know how to make decisions.

By implementing this process, data implementation workers would no longer be feeling lost within their projects, going off on tangents or using too much of their time on efforts that don’t contribute to the overall goals.

Let’s review my ‘new and improved’ framework in full with the added data strategy unit:

  1. Business Understanding
  2. Data Strategy
  3. Data Acquisition & Understanding
  4. Modeling
  5. Deployment

Adding this part to the framework would truly act as a win-win for everyone.

Data implementation workers would get a greater sense of fulfillment and accomplishment from their projects as they stay on track and produce larger successes for the business.

Business leaders would actually get true ROI on the data projects and employees they invest so much into. 

And the data industry as a whole would prosper!

 
Bio: Lillian Pierson, P.E. is a CEO & data leader that supports data professionals to evolve into world-class leaders & entrepreneurs. To date, she’s helped educate over 1.3 million data professionals on AI and data science. Lillian has authored 6 data books with Wiley & Sons Publishers as well as 5 data courses with LinkedIn Learning. She’s supported a wide variety of organizations across the globe, from the United Nations and National Geographic, to Ericsson and Saudi Aramco, and everything in between. She is a licensed Professional Engineer, in good standing. She’s been a technical consultant since 2007 and a data business mentor since 2018. She occasionally volunteers her expertise in global summits and forums on data privacy and ethics.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy