Projects to Include in a Data Science Portfolio
“Don’t pick just random projects to work on and add it to your resume or portfolio. Solve a problem that relates to the companies that you’re interested in.”
By Charlie Custer, Dataquest.io
This is an excerpt from the recently-released Dataquest.io Career Guide, specifically the chapter How to Create a Project Portfolio for Data Science Job Applications.
A data science portfolio should consist of 3-5 projects that showcase your job-relevant skills. Again, the goal here is to prove you can do the work, so the more your portfolio looks like the day-to-day work of the jobs you’re applying for, the more convincing it’s going to be.
“Don’t pick just random projects to work on and add it to your resume or portfolio,” says Pramp CEO and co-founder Refael “Rafi” Zikavashvili. “Solve a problem that relates to the companies that you’re interested in.”
This applies to the kinds of tasks you’re taking on in your projects, but also the subject areas your projects examine, and the types of data sets you’re working with. Let’s take a closer look at each of these three factors:
Kinds of tasks: What sorts of things will you need to do in the job you’re applying to? Will you be doing a lot of data cleaning? Machine learning? Data visualization? Natural language processing? Will you be strictly doing analysis, or will you be building dashboards and other analytics tools for others? Whatever the answers to these questions, they should be integrated into your portfolio.
Subject areas: Are you looking at positions in marketing? You’ll probably want to highlight projects aimed at answering marketing-related questions. If you’re looking for a data job in mobile app development, you’ll want to show off projects that demonstrate you can pull useful product insights from app data. Using your projects to show that you have some knowledge of, or at least interest in, subjects and business problems relevant to the jobs you’ve applied to can help your application stand out.
Types of data sets: Different types of data may be common in different industries, so showing that you have some experience working with data sets similar to the ones you’d see on the job helps prove you’ve got what it takes to do the work. If you’re likely to be looking at a lot of time series data in the target job, for example, it would be helpful to showcase some time series analysis skills in your portfolio.
When In Doubt, Include These Projects:
The more carefully tailored your portfolio is to the specific jobs you’re applying for, the better the results you’re likely to get. But if you’re applying for entry-level positions, you’re probably casting a wide net, and you’re also likely to be looking at positions that require a lot of the same skills regardless of industry. If you put together a portfolio with at least one project in each of these categories, you’ll be off to an excellent start.
Data Cleaning Project: Data preparation, data, munging, data cleaning – whatever you want to call it, it accounts for 60-80% of most data science jobs, so you definitely need a project that demonstrates your data scrubbing skills. At a bare minimum, you’ll want to find a messy data set (don’t pick anything that’s already been cleaned), come up with some interesting analytical questions to examine, and then clean the data and perform some basic analysis to answer those questions.
If you want to step the difficulty up here, collecting your own data (via APIs, web scraping, or some other method) demonstrates some additional skill. Working with unstructured dataof some sort (as opposed to a messy-but-still-structured data set) also looks good.
Data Storytelling and Visualization Project: Telling stories, offering real insight, and convincing others with data are key parts of any data science job. The best analysis in the world is useless if you can’t get your CEO to understand or take action based on it. This project should take readers on an analytical journey and bring them to a conclusion that’s understandable even to a layperson with little coding or statistical background.
Data visualization and communication skills will be important here to show and explain what your code is doing. It would be fine to present this in the form of a Jupyter Notebook or in R Markdown, but you could add some extra difficulty with additional polish, like customizing your chart designs or including some interactive elements.
A Group Project: Working together in a group demonstrates you’ve got communication and collaboration skills, both of which are important for data science jobs. Any type of project could be a group project; what’s important here is to demonstrate that you can function in a team setting both in interpersonal terms (clear communication, fair division of labor, genuine collaboration) and in technical terms (managing projects with Git and GitHub).
If you want to up the difficulty here, try to get involved with a popular open source project, like contributing to a data-science-related open-source library in a language of your choice. This can be quite difficult, but if you do manage to make a contribution to a popular library or package, this can really make your application stand out to employers.
For example, Alina Chistyakova, the Lead IT Recruiter at Spice IT Recruitment, says that “successful commits to well-known open-source projects” are one of the things that makes a data science portfolio stand out to her. Kitware HR Director Jeff Hall said that “What really puts a plus in the column of candidates that apply here is having contributed to our specific open-source projects.”
Other Project Types to Consider
End-to-End System Building Project: A lot of data science jobs can include building systems that can efficiently analyze regular data sets as they come in, rather than analyzing a single specific data set. For example, you might be tasked with building a dashboard for the sales team that visualizes the company’s sales data and updates regularly as new data comes in.
This project should show that you’re capable of building a system that can perform the same analysis on new data sets as they’re input, as well as capable of building a system that can be understood and run with relative ease by others. The simplest version of this would be well-commented code that can take data from a public, regularly-updated data set, and perform some analysis. Its
README file should explain how it can be used by others, and the project should be relatively easy for other coders to run via the command line.
If you’d like to step up the difficulty here, the sky’s the limit: you could build full-fledged interactive web dashboards, or build a system that handles real-time/streaming data. The key here is just to show that you can build an analytical system that’s reusable and that other people, or at the very least other programmers, can understand.
Explanatory Blog Post, Article, or Talk: Being able to explain complex technical concepts in simple, understandable terms is a valuable skill for any data scientist, so explaining some technical concept in a blog post, article, or conference talk can be a great addition to your portfolio if it’s done well. Just be sure to pick a topic that’s suitably complex, and one that you understand and can explain. A blog post explaining what’s happening under the hood in a machine learning algorithm that’s frequently used in your target industry, for example, could be a great inclusion to a portfolio.
Bio: Charlie Custer is Content Marketer at Dataquest.io, and is a content creator of all types, but especially: writer, editor, and motion designer and animator specializing in 2D After Effects work.
Original. Reposted with permission.
- Think Twice Before You Accept That Fancy Data Science Job
- How to Recognize a Good Data Scientist Job From a Bad One
- Cracking the Data Scientist Interview
Top Stories Past 30 Days