Top Data Science Projects to Build Your Skills

Check out this list of data science project ideas that you can use to boost your skills, organized by level of expertise.



Top Data Science Projects to Build Your Skills
Octavian Dan via Unsplash

 

If you’re looking for a career in Data Science, your portfolio is your priority. Data Science, although it is in high demand, it is a very competitive market. Every day new people are transitioning into the tech market, making it difficult for hiring managers to choose the right candidate. 

If you want to stand out from the crowd, showcasing your coding skills through a portfolio is your best friend. Hiring managers hear about people's skills every day, they want to see your skills. 
Below is a list of Data Science project ideas that you can use to boost your skills. I will split them up by level of expertise.

 

1. Beginner Data Scientists

 

Iris Data

 

This is one of the most popular datasets for beginners. It is versatile, easy to work with, and is your best bet when wanting to explore pattern recognition. The dataset is not large, having only 150 rows & 4 columns, with no missing values; making it a simple dataset for beginners.

Dataset Link: UCI Iris Dataset

Type of Problem/Task: Classification

Example work: sci-kit learn Iris Dataset

 

Titanic 

 

You may have heard about the famous Titanic dataset, either through your Bootcamp course or just exploring topics around Data Science. The data is split into two groups: training set (train.csv) and test set (test.csv).
 
Dataset Link: Titanic 

Types of Problem/Task: Creating a Machine Learning model that predicts the Survival of Titanic passenger

Example work: Kaggle Titanic Dataset

 

Wine 

 

If you’re interested in wine, this will be an interesting project for you. This dataset will test your understanding of feature selection, outliers, and unbalanced data. There are no missing values, making it perfect for beginners.

Dataset Link: UCI Wine Dataset

Type of Problem/Task: Classification

Example work: sci-kit learn Outlier Detection Wine Dataset and sci-kit learn Feature Scaling Wine Dataset

 

Census Income 

 

This dataset further tests your understanding of how to make predictions. The task at hand is whether a person makes over 50K a year. It contains missing values, allowing you to explore different data cleaning methods. Depending on your level of expertise, you can explore it through Support Vector Clustering, Bayesian, and more. 

Dataset Link: UCI Census Income

Type of Problem/Task: Classification

Example work: Kaggle Census Income 

 

2. Intermediate Data Scientists

 

Human Activity Recognition Using Smartphones Data Set

 

If you have taken any Machine Learning courses or Bootcamps, you may have come across this dataset. It is a classification problem that can be explored with Machine Learning models. This dataset challenges you to differentiate yourself from a beginner to an intermediate. The data set has 10,299 rows and 561 columns.

Dataset Link: UCI Human Activity Recognition Using Smartphones Data Set

Type of Problem/Task: Classification, Clustering

Example work: Machine Learning Mastery Blog and Kaggle examples

 

Breast Cancer

 

This is a classification dataset, which records the measurements for breast cancer cases, containing two classes; benign and malignant. The dataset contains missing values, testing your data cleaning skills. You can explore the different variables and how each correlates with one another if one variable shows to affect another, and more. 

Dataset Link: UCI Breast Cancer Wisconsin

Type of Problem/Task: Classification

Example work: Breast Cancer Prediction using Machine Learning and other Kaggle examples

 

Twitter

 

This twitter dataset is very popular if you want to specifically work with Sentiment Analysis. This task will allow you to explore and classify tweets based on their sentiment; strongly negative (0), negative (1), neutral (2), positive (3), highly positive (4). The dataset is 3MB in size and has 31,962 tweets.

Dataset Link: Kaggle Twitter dataset by Analytics Vidhya 

Type of Problem/Task: Classification

Example work: Kaggle examples 

 

3. Advanced Level

 

Urban Sound Classification

 

This is a classification task, which introduces you to audio processing. The dataset contains 8,732 labeled sound excerpts of urban sounds from 10 classes. You can use neural network models in order to classify the type of sound from the audio.

Dataset Link: Analytics Vidhya Urban Sound Classification

Type of Problem/Task: Classification

Example work: Shubham Gupta TDS

 

VoxCeleb

 

VoxCeleb is an audio-visual dataset that contains short clips of human speech, which have been extracted from interview videos uploaded to YouTube. This dataset allows you to explore speech recognition through isolation and identification. 

The dataset consists of two versions, VoxCeleb1 and VoxCeleb2. VoxCeleb1 contains over 100,000 utterances for 1,251 celebrities, where VoxCeleb2 contains over a million utterances for 6,112 identities.

Dataset Link: VoxCeleb

Type of Problem/Task: Classification, Speech Recognition 

Example work: qqueing github

 

VisualQA

 

VQA is a new dataset containing open-ended questions about images. You will be required to have knowledge and understanding of computer vision, language, and common sense knowledge to answer.

It contains 265,016 images, with at least 3 questions per image, where you will be asked to use deep learning to answer the open-ended questions about images.

Dataset Link: visualqa

Type of Problem/Task: Computer Vision

Example work: VQA Challenge

 

Conclusion

 

I hope these project ideas will help you boost your portfolio, and you get a better understanding of your strengths and weaknesses. Helping you to figure out what you need to work on.

If you have any suggestions, please drop them in the comments!

 
 
Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.