Advice for New Data Scientists
We provide advice for junior data scientists as they begin their career, with tips and commentary from a tech lead at Airbnb.
By Lindsay M. Pettingill, PhD, Airbnb.
You’re a new data scientist - congrats! If you are a junior data scientist in your first data role, you should be asking yourself, ‘How can I effectively execute?’ This post will focus on that execution. It is intended primarily for data scientists embedded in product teams, but many of the tips can be generalized to any new hire in a tech role.
In this post I break down effective execution into the following 4 categories:
- Estimating how long tasks will take,
- How to get your questions answered, and
- Communicating & sharing your work.
To work well with PMs, you need some understanding of what they do. Dan Hill (formerly of Airbnb) captures some observations here. He sees PMs as creating the process that delivers good product to end users. I couldn’t agree more: PMs’ skills sets are varied, but process is usually the common denominator. Take advantage of this. PMs should be able to prioritize their asks for you; your manager can help with this too. In the beginning, pay close attention to how both prioritize: there’s no better way to learn and understand what is important to them — and the business! You’ll soon get a sense of how long things will take. This, alongside a sense of impact, will help you better prioritize.
On the practical side, you should always have an ongoing document in which you log your work. Having an overview of what’s on your plate not only allows you to easily see how you spend your time across various types of work (big bets, small bets, ad-hoc work, infrastructure, etc) but also makes having performance conversations much easier. If you have a sense that much of your work serves short-term goals or projects, you will have the data to substantiate it. If this is the case, be sure to communicate it to your manager as part of their job is to make sure your time is well spent. This practice also provides stakeholders and business partners with visibility into your workflow.
Protip: You should always ensure that your business partner knows what she wants. In practice this means that whenever someone asks you for data, you should make sure they articulate what they need it for. Oftentimes, this process will allow you to learn that what someone wants does not always address what they need. A huge sign of effectiveness as you progress in your role is the ability to identify and answer the questions that influence decisions.
Estimating how long tasks will take
As a new data scientist, it is crucial to understand the differences between:
- How long you want something to take (how desirable is the task relative to your other priorities and interests),
- How long it should take, (how feasible is it due to tooling, infrastructure, logging, etc), and
- How long it actually takes to just do it, given desire and reality.
I’ve noticed that the tasks that take junior data scientists the longest to complete are usually not those that are the most intellectually challenging. Instead, they are those that are the least well-defined, or those in which infrastructure needs are not fully articulated or understood. In these situations, you need to push for clarity and make sure you know what your systems can handle - this will help you and those with whom you are working.
- Do you understand what’s being asked for analytically? If not, make sure you do. You cannot possibly execute well when you don’t understand what you’re supposed to be looking into or for.
- Once you understand what is being asked of you, next ask: do the data exist? Do you understand the data generation process? If logging or data do not exist or are buggy, you cannot be very helpful. If this is the case, ask that logging be added, and make it clear that any analysis will be delayed until you have sufficient data. Everyone on your team needs to understand the cost of missing data — not just engineers.
- As you consider analysis options, is an ad-hoc solution fine, or should you invest in something more sustainable? Can you write a function or package to automate this type of analysis in the future? At startups in particular, you should always be thinking about ways to scale yourself. One golden piece of advice: approach your work as if you are going to have to reproduce or replicate everything you do.
PMs are not data scientists and it is not their job to evaluate different analytic approaches — it’s yours. As mentioned above, it is their job to provide a framework for delivering good product. The best PMs I’ve worked with have frameworks that are clear and consistent, particularly when it comes to data products: to them, perfect and opaque are (always) the enemy of the good. If one solution takes twice as long as another, is quite complicated to implement, or is a black box, you need to be very clear and convincing about why it should be preferred. And your convincing should rarely, if ever, include words like ‘AUC’ or ‘gradient descent’. Always focus on business impact and characterize the various data products/solutions you build in those terms!
Protip: Be a host. It’s a privilege to be a part of a product team that is interested in data-driven decision making. The more data informed your team is, the more effective you can all be. Be an advocate for data education and the tooling to support data self-service.
How to get your questions answered
At Airbnb we value ingenuity and problem-solving, which is just one of the reasons I enjoy working here. But because so many of us over-index on these skills, we often approach getting help by demonstrating our ingenuity. You all know what I am talking about: rather than saying what you need help with, you describe in great detail what you’re doing, and ask for very specific advice: ‘How do I transform the data in this particular way?’, ‘How do I use [this specific tool] to do [this very specific thing]’. I totally get the impulse here: you’re demonstrating that you’re trying, and have invested in a solution. But what you may have missed is that the solution you’ve already honed in on is likely one of many. When you seek advice on only a particular implementation, you’ve narrowed the path forward. When seeking help (from anyone, really) always start with the goal; this opens you up to a wider range of inputs.
Protip: In tech, you do not get ahead by monopolizing information. When you do receive help, make it a practice to circle back and share the solution/fix with others. Stack Overflow is a good place for this, but a knowledge repository is great too. We’re all better when information flows freely and is widely accessible.
Communicating & Sharing your work
If you work on an embedded product team, communication is one of, if not the most important aspects of your job. The most powerful advice I have for junior data scientists in these regards is the importance of communicating at different altitudes. For most communications with those outside of the data org, it’s not the Appendix that they are interested in; it’s the TL;DR (Too Long; Don’t Read). In practice, this often means that it is not your job to tell business partners how much work you did or how hard it was or what the various model evaluation measures were — save these discussions for your manager and peers. If a PM asks a question about your users, answer it as simply as possible, within reason. Do not hide your response in a maze of technical details- you will lose people this way! If they have questions (which they always should) they will follow up.
The more you work with someone, the more you will be able to anticipate their follow-ups. But do not assume that they are as interested in the path you took to get there. Your business partners need to deliver product and you need to help them get there.
Finally, the elephant in the room in any conversation about sharing your work is deadlines. You will assuredly encounter a business partner who asks for something without specifying a deadline. They will then get upset when it isn’t delivered when they need it. Be sure to get them to specify and document those deadlines when you commit to the project. If you are getting close to a deadline and know you will miss it, communicate it proactively. This is a sign of maturity, not failure.
Protip: Make sure to get feedback early and often from your manager, other data scientists, and business partners on your work. Do not underestimate the value of socializing your work. This is especially important if your findings are counterintuitive or force a reconsideration of existing (anchoring) data points. Socializing your work will help you refine, develop, and evangelize it; it’s far better to get tough questions before a big presentation than during it. Finally, if you are unclear about venues in which to socialize your work, ask or create them.
Best of luck, and stay tuned for future posts on advancing in your role. In the meantime, feel free to leave any questions/suggestions below.
Original. Reposted with permission.
- On-line and web-based: Analytics, Data Mining, Data Science, Machine Learning education
- Software for Analytics, Data Science, Data Mining, and Machine Learning
- Data Science for Decision Makers: A Discussion with Dr Stelios Kampakis
- How To Work In Data Science, AI, Big Data
- What no one will tell you about data science job applications