KDnuggets Home » News » 2019 » Nov » Opinions » Advice for New and Junior Data Scientists ( 19:n45 )

Advice for New and Junior Data Scientists


If you are a new Data Scientist early in your professional journey, and you’re a bit confused and lost, then follow this advice to figure out how to best contribute to your company.



By Darrin Lim, Data Scientist at Outcome Health.

 

Who Do I Think I Am

My name’s Darrin, and I’ve been a Data Scientist at Outcome Health for close to 2 years now. We’re a health tech company focused on patient education at the point of care. In English, that means we put devices in doctor’s offices with content for patients. We have over 100,000 devices in the field — which means lots of data.

 

Who This Article is For

This article is for the many Junior/Associate Data Scientists out there. If you’re just beginning your journey, if you’re a bit confused and lost, if you’re not always certain what you’re supposed to be doing or how to best contribute to your new company, you’ve come to the right place.

You. Me. Your C-suite, probably.

 

Why I Wrote This Article

I’ve seen many data science articles explaining the ins-and-outs of what doing more advanced data science work is like, the skills great data scientists need, the types of projects such as masters tackle. But as I went through my Associate Data Scientist journey, I had a hell of a time navigating what to do to actually be useful as a more junior team member.

This article is an attempt to alleviate that issue for other budding Associate Data Scientists.

My main goal is to provide a basic blueprint, with examples, of how an Associate Data Scientist can quickly add significant value to an organization.

I was lucky enough to figure things out along the way to the degree that I am now a Data Scientist — I’m hoping this will help some of you along the same path!

 

Step 1: Finding Low-Risk, High-Reward Projects

The Main Issue

The first problem every Associate Data Scientist encounters is probably the same: How do you…

  1. Add value to real projects…
  2. While developing useful skills
  3. Without breaking anything important?

Answering this question took me a good amount of stumbling, talking to all sorts of people in the organization, and generally asking dumb questions a lot.

The Key Insight

What I discovered was that automating monitoring processes tended to be the lowest-risk, highest-reward effort I could spend my time on.

  • There was no fear of breaking production systems → Low risk
  • There was a clear value add to the company → Saved time (High reward)
  • I could build the automated processes in Python/with Airflow/[insert new cool tech here] → Develop important skills

Check, check, and check.

Generalized, Actionable Takeaway

In general, if you can find tasks that could be automated, but haven’t due to lack of bandwidth or priority, these are great projects to test your chops on. Almost every company has them, they are low-risk/high-reward, and if you can get an MVP out quickly, momentum tends to build fast. There are tons of articles on why getting an MVP out quickly is a good thing, so I’ll let them explain the logic there.

My Experience

In my workplace, I saw obvious inefficiencies in how we were monitoring our KPIs for new software releases. Manual monitoring processes tend to be places ripe for improvement. They usually exist due to a lack of production processes put in place.

At Outcome, we would push software updates in phases, with each phase going to a larger percentage of our 100,000 device network. However, during each phase, an analyst would pull the KPIs for that software release manually. This was obviously a waste of time, but we hadn’t had the time to automate the process.

 

Step 2: How to Generate Great Insights

The Main Issue

So, you have an idea for a project to automate some process. Now what? Well, as data scientists, we know that part of our job description is to “generate insights”.

My main response to this for a long time was “what the hell does that mean?!” I have an engineer brain by default. I need a tangible, concrete explanation. Absent that, I had to make my own.

So, I was sitting there project in one hand, laptop in the other, and I realized I didn’t have my next step in-sight. (Get it. Insight. Heh. Your groans are my validation.)

The Key Insight

For me, a great “insight” is anything that decreases the amount of time needed to make a decision.

In concrete terms, this is what people usually mean when they say insight. They want a delivered piece of information that makes clear what they should do — the sooner the better. In the case an insight causes you to reverse a previous decision, it still decreases the amount of time needed to make a decision because it just shortened the amount of time between your “bad” decision and your new one.

Generalized, Actionable Takeaway

What this means is that a good goal for any Associate Data Scientist is to simply decrease the amount of time between data ingestion and action. Obviously, that’s a broad mandate, but that means you get to learn about systems end-to-end, and every Data Scientist should know at least a bit about every part of the data stack (more on that in this AI Hierarchy of Needs graph). Besides, the Data Scientist role is currently so broad and poorly defined, your range of duties will likely encompass the entire stack anyway.

My Experience

In my case, beyond simply automating a data pull to save time, I realized 2 things.

Fun for teams. Not for Batman.

  1. To save time, I could push the data, rather than pulling it, to where the users were: Slack. This involved connecting to the Slack API and pushing a chart created in Python.
  2. I could automate some of the inference the human analyst was doing. (The machines are coming for our jobs.)

We run ad campaigns on our network, and one of our main KPIs is the number of plays our ad campaigns have in aggregate in a day. This number shouldn’t vary much, except on weekends, where they should drop to close to zero.

The analyst checking the number would check this KPI, and it was often normal. But, every now and then, a large campaign would end, causing the number of plays to drop drastically. This could look like a large issue with a given software release.

I realized that the number of plays is (obviously) correlated with the number of devices in the field and the number of campaigns playing. I built a simple, piece-wise linear model (simple is better for simple systems, you fancy neural network people) to account for the number of campaigns, number of devices, and weekends. By training the model on pre-software update data, I could project the KPIs out post-software update and set guardrails based on the variance of the pre-software update data.

This allowed the analyst to look at one graph, delivered to him via Slack, and know whether or not the KPIs were “within expectations”!

 

Step 3: How to Get People to Act on Your Insights

The Main Issue

After building a couple of these systems, I began to run into an issue most Data Scientists run into at some point. People weren’t taking action. And action is everything. What good is a well-delivered insight if nothing comes of it? (Or was it a well-delivered insight at all, in that case? Ooooh.)

If a model was built, a dashboard constructed, or an alerting system developed that never caused action, did it really exist at all? (Yes, it did, but you get the point.)

The Key Insight

The main insight that changed my thinking on this actually stems from an article on why laziness doesn’t exist. The main takeaway being:

Taking action requires a why and a what. If someone isn’t taking action, they don’t have a strong enough reason to do something or they don’t know exactly what to do.

This is exactly what I was seeing. People were either confused about the meaning of an alert, or they weren’t sure what to do in response to an alert. In both cases, the alerts caused anxiety and no action, which was worse than no alert at all.

Generalized, Actionable Takeaway

In your work, this means you need to make sure that when communicating insights, you clarify the exact why the insight is important and what to do about it. Actually, this is true of any communication leading to action. Poorly run meetings are a good example of what not to do. If you’ve ever been in a meeting without a clear why and what, you’ll know that the meeting meanders, nothing is accomplished, and everyone’s time is wasted.

I don’t go to meetings if they don’t have a clear agenda. Meetings with no agendas are one of my biggest pet peeves. Also, acronyms suck. That’s just an unrelated, personal thing, but Elon Musk agrees.

My Experience

My specific problems began to occur as the nature of the monitoring I was attempting to automate became more complex. Rather than having specific actions (of which there could be dozens), I began to provide general statements with a small data dump. A data dump was something I could have worked with.

The data dump was not something a consumer would ever want to work with. And the customer is who matters.

The Slack channels I set up for these new channels began to be used less as the alerts became too general. My only choice was to create more specific messages and action steps.

I discovered a very effective method of doing this is by thinking through the exact resolution steps to an issue — for complex issues, this can be a difficult exercise in forethought. However, done properly, it makes action steps and needed messages crystal clear. And with clarity came the reuse of my channels and actions taken as a result!

 

The End Result

The “Anomaly Detection” project began of my own initiative, and grew over 6 months of steady work, made possible by the freedom and trust given to me by my excellent manager. At the present moment, “Anomaly Detection” pushes alerts to over half of our Engineering-Product-Design org (~20 of 40 individuals) and several other stakeholders outside of that org. It has become a large source of operational de-risking efforts, and new channels are continuously being added.

Not a bad outcome for something that started as a side-project!

 

Last Notes & A Sendoff

I’m not some magical genius. I’m from Nebraska. (Love you, fellow Nebraskans.) Plus, I was dropped as a child. (Who wasn’t, though. Love you, mom & dad!).

Nebraska. Land of corn, football, and…more corn.

What I did my first year as a Data Scientist is something any Junior/Associate Data Scientist could do, and I’m hoping some of you will take inspiration from the good fortune I’ve had and effort I’ve made. So, as you start your journey, I leave you with a saying my math teacher used to give us before every test (which were also rough, but fun, journeys — I was a math nerd):

Good luck, have fun, and do well!

 

Original. Reposted with permission.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy