Compilation of Advice for New and Aspiring Data Scientists
Check out this compilation of advice for the new and upcoming data scientist, condensing 30+ pieces of advice into 6 minutes.
By Conor Dewey, Virginia Tech
Maybe it’s just me, but I’ve noticed more and more posts on Medium and elsewhere in the data science community centered around offering up advice to newcomers in the field. I think this is awesome and it’s this type of content that helped me immensely on my journey to data scientist and still helps me today as I continue to learn and grow.
But as the amount of these posts grows larger, it requires a bit more work on the part of the reader to seek out, sift through, and process all of the available information. This post is designed to make it a little easier for aspiring data scientists to find all of the excellent advice out there from experts in the field. The majority of the ideas below are condensed from the following 6 posts that I found especially helpful:
- Advice for New and Junior Data Scientists
- 12 Things I Wish I Knew Before Starting as a Data Scientist
- Advice for New Data Scientists
- 6 Recommendations for Aspiring Data Scientists
- 16 Useful Advices for Aspiring Data Scientists
- Aspiring Data Scientists Master These Fundamentals
First, I went through each article and plucked out each individual insight or piece of advice. Then I looked over the list of ideas and made note of any common themes among the different resources, seen below. Later on in this post, I include all of the other pieces of advice that stood alone. Let’s get to it.
As mentioned previously, here are the ideas that were repeated among several of the articles linked above. For each point, I’ll include a bit of commentary from myself to go along with it.
Master the art of communication
This was probably the most popular theme of them all. The importance of communication in data science is often harped on, and for good reason. Sifting through data to find insights is useless if you can’t communicate those insights and drive impact in some way. Like anything else, effective communication is a skill that you can practice and improve on with time.
Build a solid statistics foundation
Whether it’s data analytics, machine learning, running experiments, or something more esoteric, you can’t avoid the use of statistics. Taking the time to build a solid foundation and master fundamental statistics concepts will pay itself off over and over again.
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
— Josh Wills (@josh_wills) May 3, 2012
Be skeptic — question your assumptions constantly
As data scientists, we make assumptions constantly, whether we know it or not. These assumptions might be related to the data we’re working with or the problem that we’re trying to tackle, but they need to be questioned. By keeping some level of paranoia about our outputs, we ensure that we’re on the right track. This skill is often associated with exploratory, research-oriented work, but it’s much more applicable than you think.
Curiosity will take you a long way — ask lots of questions
Similar to the last point, it also pays to be curious. Curiosity can lead you to interesting insights that you would never have found otherwise. It drives you to adopt a growth mindset and ask questions constantly — questions that will help you learn and grow as much as the work itself.
Put your work out into the world (Github, blogging, etc.)
This is a big one for me. Early on in my career, I learned about the benefits of putting your work out into the world. Whether it’s through blog posts, projects, tweets, or something else — it doesn’t really matter. What matters is that you are putting something out there. The tweet below pretty much sums up my stance on this:
— Amelia McNamara (@AmeliaMN) November 3, 2017
Build learning projects with real data that interests you
When working on learning projects, make sure you’re interested in the topic. This seems pretty straightforward, but plenty of aspiring data scientists get caught up trying to produce the project that seems the most complex or impressive to would-be employees and colleagues. Stick to what you enjoy and use real-world data instead of super-clean Kaggle or UCI datasets. For bonus points, collect some data and build your own dataset.
You’ll never know everything — and that’s okay
It’s clear that data science is a broad, complex field. You could spend your whole working life practicing it and not even skim the surface. There’s always going to be another technique to master, another tool to learn, and another paper to read. This is why imposter syndrome is so relevant in the field. I find this to be frustrating and exciting all at once.
Pick the right tools for the problem and master them
Along the same lines, just because you can’t master every tool out there doesn’t mean that you shouldn’t master some of them. There will probably be a couple of building blocks that you spend most of your day working with. That might be R, SQL, Vim, Airflow, Scikit-learn, anything really. It doesn’t matter as long as you hone in on your critical tools and learn them well.
More key points
These are the ideas that I didn’t find in more than one of the linked posts. You’ll find equally useful and interesting information here, some more specialized than the common themes from before.
- Prioritize effectively
- Learn to properly estimate how long tasks will take
- Think about your critical path
- Partner with experienced data scientists
- Teach and evangelize data science
- Learn domain knowledge, not just methods
- The most important skill is critical thinking
- Go to events — hackathons, conferences, meetups
- Learn relevant skills, not just technical ones
- Be flexible with how you enter the field
- Get some hands-on experience with cloud computing
- Get used to gluing things together and standing up services
- Write a white paper
- Always make sure you understand your data before diving in
- A mix of algorithms will usually beat just one
- Take as many math and physics courses as possible
- Invest in your software engineering skills
- Trust yourself and follow your passion
- Try out different roles within data science
- When communicating analysis, tell a story
- Distribution of a variable is usually more important than its location
- Sampling is hard and won’t always be perfect
- Become a confident command line user to boost productivity
- Be learning constantly
- Start a blog and build out a portfolio to display your skills
- Look for companies that leverage data science for their strategy
- The size of the company will affect your role
- Don’t demand perfection out of your first job
- Learn how to sell your ideas
Along with the primary articles that I used to compile this list, there are a million other great blog posts for aspiring data scientists to utilize. These posts below also helped inform some of the key points listed above.
- Advice Applying to Data Science Jobs
- Advice on Building Data Portfolio Projects
- Advice to Aspiring Data Scientists: Start a Blog
- The Two Sides of Getting a Job as a Data Scientist
- Doing Data Science at Twitter
For more information on any of the bulleted points above, be sure to explore the awesome resources that I linked to throughout this post.
The journey to data scientist isn’t an easy one. Starting out as a data scientist is no difference. But the beauty of information sharing makes things that much easier. It lets us learn from those that came before us. I think that’s a pretty cool thing. Pass it on and enjoy the ride.
Thanks for reading! Feel free to check out some of my similar essays below and subscribe to my newsletter to receive any new content.
- 5 Resources to Inspire Your Next Data Science Project
- The Big List of DS/ML Interview Resources
- Python for Data Science: 8 Concepts You May Have Forgotten
Original. Reposted with permission.
- Think Twice Before You Accept That Fancy Data Science Job
- Cracking the Data Scientist Interview
- 5 Resources to Inspire Your Next Data Science Project
Top Stories Past 30 Days