Top Data Science Projects to Build Your Skills
Check out this list of data science project ideas that you can use to boost your skills, organized by level of expertise.

Octavian Dan via Unsplash
If you’re looking for a career in Data Science, your portfolio is your priority. Data Science, although it is in high demand, it is a very competitive market. Every day new people are transitioning into the tech market, making it difficult for hiring managers to choose the right candidate.
If you want to stand out from the crowd, showcasing your coding skills through a portfolio is your best friend. Hiring managers hear about people's skills every day, they want to see your skills.
Below is a list of Data Science project ideas that you can use to boost your skills. I will split them up by level of expertise.
1. Beginner Data Scientists
Iris Data
This is one of the most popular datasets for beginners. It is versatile, easy to work with, and is your best bet when wanting to explore pattern recognition. The dataset is not large, having only 150 rows & 4 columns, with no missing values; making it a simple dataset for beginners.
Dataset Link: UCI Iris Dataset
Type of Problem/Task: Classification
Example work: sci-kit learn Iris Dataset
Titanic
You may have heard about the famous Titanic dataset, either through your Bootcamp course or just exploring topics around Data Science. The data is split into two groups: training set (train.csv) and test set (test.csv).
Dataset Link: Titanic
Types of Problem/Task: Creating a Machine Learning model that predicts the Survival of Titanic passenger
Example work: Kaggle Titanic Dataset
Wine
If you’re interested in wine, this will be an interesting project for you. This dataset will test your understanding of feature selection, outliers, and unbalanced data. There are no missing values, making it perfect for beginners.
Dataset Link: UCI Wine Dataset
Type of Problem/Task: Classification
Example work: sci-kit learn Outlier Detection Wine Dataset and sci-kit learn Feature Scaling Wine Dataset
Census Income
This dataset further tests your understanding of how to make predictions. The task at hand is whether a person makes over 50K a year. It contains missing values, allowing you to explore different data cleaning methods. Depending on your level of expertise, you can explore it through Support Vector Clustering, Bayesian, and more.
Dataset Link: UCI Census Income
Type of Problem/Task: Classification
Example work: Kaggle Census Income
2. Intermediate Data Scientists
Human Activity Recognition Using Smartphones Data Set
If you have taken any Machine Learning courses or Bootcamps, you may have come across this dataset. It is a classification problem that can be explored with Machine Learning models. This dataset challenges you to differentiate yourself from a beginner to an intermediate. The data set has 10,299 rows and 561 columns.
Dataset Link: UCI Human Activity Recognition Using Smartphones Data Set
Type of Problem/Task: Classification, Clustering
Example work: Machine Learning Mastery Blog and Kaggle examples
Breast Cancer
This is a classification dataset, which records the measurements for breast cancer cases, containing two classes; benign and malignant. The dataset contains missing values, testing your data cleaning skills. You can explore the different variables and how each correlates with one another if one variable shows to affect another, and more.
Dataset Link: UCI Breast Cancer Wisconsin
Type of Problem/Task: Classification
Example work: Breast Cancer Prediction using Machine Learning and other Kaggle examples
This twitter dataset is very popular if you want to specifically work with Sentiment Analysis. This task will allow you to explore and classify tweets based on their sentiment; strongly negative (0), negative (1), neutral (2), positive (3), highly positive (4). The dataset is 3MB in size and has 31,962 tweets.
Dataset Link: Kaggle Twitter dataset by Analytics Vidhya
Type of Problem/Task: Classification
Example work: Kaggle examples
3. Advanced Level
Urban Sound Classification
This is a classification task, which introduces you to audio processing. The dataset contains 8,732 labeled sound excerpts of urban sounds from 10 classes. You can use neural network models in order to classify the type of sound from the audio.
Dataset Link: Analytics Vidhya Urban Sound Classification
Type of Problem/Task: Classification
Example work: Shubham Gupta TDS
VoxCeleb
VoxCeleb is an audio-visual dataset that contains short clips of human speech, which have been extracted from interview videos uploaded to YouTube. This dataset allows you to explore speech recognition through isolation and identification.
The dataset consists of two versions, VoxCeleb1 and VoxCeleb2. VoxCeleb1 contains over 100,000 utterances for 1,251 celebrities, where VoxCeleb2 contains over a million utterances for 6,112 identities.
Dataset Link: VoxCeleb
Type of Problem/Task: Classification, Speech Recognition
Example work: qqueing github
VisualQA
VQA is a new dataset containing open-ended questions about images. You will be required to have knowledge and understanding of computer vision, language, and common sense knowledge to answer.
It contains 265,016 images, with at least 3 questions per image, where you will be asked to use deep learning to answer the open-ended questions about images.
Dataset Link: visualqa
Type of Problem/Task: Computer Vision
Example work: VQA Challenge
Conclusion
I hope these project ideas will help you boost your portfolio, and you get a better understanding of your strengths and weaknesses. Helping you to figure out what you need to work on.
If you have any suggestions, please drop them in the comments!
Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.