5 Project Ideas to Stay Up-To-Date as a Data Scientist
The skills you have need maintenance and occasional updates. Doing an interesting data science project is what will keep you from getting rusty.
I believe in practice. Practice as an application of knowledge and ideas. One idea I don’t believe in is the one saying the road between theory and practice is a one-way highway. In other words, it tells us that practice is simply the application of theoretical principles. But practice is much more; it is also a birthplace of ideas that gives a push to new theories.
Reading articles or research papers, attending conferences or meet-ups to keep abreast with the latest technologies, and gaining better theoretical foundations is nothing to frown upon. I highly recommend that! But whatever the updates you get, your work as a data scientist will still boil down to several fundamental skills: data collection, analysis, and visualization.
And you need to use them! If you lack opportunities for that, you have to create them yourself. The most comprehensive way is to think of the data science project and do it from start to finish.
In such projects, you would use APIs to get the actual data. Through data cleaning and analysis, you would get insights, which you could present in some nice visualizations. Finally, you could post it on reddit, get the feedback and potentially take them into account to improve your project.
Ideally, the projects should also be fun, not only a drab way to polish your skills.
1. Word Counts Change in Lyrics for Top 10 Songs
The project’s idea is to get the data about the top 10 most popular songs on Spotify through the Spotify API. The Spotify metadata can then be connected with the lyrics from Genius API or some other lyrics site.
The definition of the ‘word’ in the lyrics is up to you. For example, you can count the words in total or only unique words. Do you include singalong parts like ‘na na na’ or ‘la la la’?
By analyzing this data, you can show the historical development and predict future tastes. You could include some other parameters, such as the song length or the intro length. Especially the intro would be interesting to see because the studies, such as the one by The Ohio State University, show a dramatic decrease in the songs' intros length throughout the years.
You can take some ideas on the approach and the visualization from this project.
2. Investing in the Property for Rentals
If you’re considering investing in properties across the country so you can rent them, it would be useful to analyze which factors influence the rental price and, therefore, the profitability and potential of your business.
The data can be acquired through some of the rental APIs. The factors that could come up as important in deciding where and when to invest could be the location, property size, date of build, amenities, rental price trends, etc.
The inspiration for the approach can be drawn from this nice Airbnb data science project.
3. Detecting Fake News
Here you can use Facebook, Twitter, or reddit API to get the data you’ll work with. Based on the data you have, you can analyze the posts on social media and separate the fake news from the non-fake ones. Your approach could be more general, but you can also focus on something. For example, take up a topic such as COVID-19 pandemics, USA presidential elections, or the war in Ukraine. You could analyze the geographical distribution of fake news, the demographics of those spreading them, or the topics that are the most fake news susceptible.
Maybe you could also try to train a model for recognizing fake news. You can find some ideas on the approach in work by Kai Shu, Depak Deepak Mahudeswaran, and Huan Liu. Some other useful sources are a Data Flair project or this one, which you can do in R or maybe recreate in Python.
4. Type of People on the Postage Stamps
If you’re interested in introducing more equality regarding occupation, race, or gender representation on the postage stamps, this project could be for you. The idea is to use the Wikipedia API to get the list of people on the postage stamps, e.g., in the USA. You can also do that for some other countries, even the whole world. Maybe even compare data across the countries to see how they compare. One of the ideas is also to connect this data with the inequality data. It would be interesting to see how the inequality of representation on the postage stamps correlates with the country's economic inequality.
There’s a project that analyzes the types of people celebrated by the countries on their currency. I think you can get some good ideas about the approach and visualization for your project.
5. Predicting Book Sales
The Amazon API is a great tool for this project. From it, you can get data about book sales. For example, you can analyze sales across genres, publishers, writers, number of pages, price, number of reviews, ratings, and so on.
Once you get the data and analyze it, you can predict the parameters your book has to satisfy to be a bestseller. If you need ideas on approaching and visualizing this project, you can go through this 20-page work published on ResearchGate.
6. Create Your Own Project
You shouldn’t limit yourself to these five projects, of course. I strongly advise you to come up with your own data analytics project ideas. When you think of a project, think of the following:
- Goal: The project's goal should be to create something using modern data science and something that users would like.
- Make it Interesting and User-Friendly: This usually means building an interactive model and having a visualization.
- Ideas: A good source of inspiration for projects and visualization is dataisbeautiful subreddit.
- Replicate & Invent: Try to replicate the ideas you find there or use them to come up with your own twist on the project.
The project factors that will help you stay up-to-date as a data scientist are:
- Using an API to collect data
- Using some visualization library to create graphics
- Post your work on the subreddit to get feedback
I’m sure any of these projects will challenge you to use your current knowledge and also try something new fully. When choosing a project, it’s vital that it leads to something tangible and that you also have some fun along the way. That’s why I tried to select projects that touched different areas of our lives.
What these projects all have in common is that they make you use all the fundamental data science techniques and skills. Using them is the best way to keep in touch with those skills.
If you can use these projects to think of some even better and more interesting ones, even better. Imagination is also an important but often overlooked data science skill. But make sure that the project also challenges your other data science skills.
Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Connect with him on Twitter: StrataScratch or LinkedIn.