Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2021 » Sep » Tutorials, Overviews » Use These Unique Data Sets to Sharpen Your Data Science Skills ( 21:n38 )

Use These Unique Data Sets to Sharpen Your Data Science Skills


Want to get your hands on some real-world data sets right now? Kick off your bootcamp prep with this list of hot-button data sets curated to help you hone different data science skills.



Sponsored Post.

Springboard Unf

Want to warm up your data science skills before jumping into a bootcamp program? Aspiring data scientists can practice key techniques like data cleaning, data analysis, data visualization and even machine learning with free, publicly available data sets. Hands-on data science exploration is one of the most effective ways to prepare for a data science bootcamp. In addition to learning more about your strengths, interests, and the skills you’ll need to grow, you’ll also gain experience working with the intricacies and idiosyncrasies of real-world data.

The sooner you can get comfortable using real-world data, the better. Students in leading programs like the University of North Florida’s Division of Continuing Education Data Science Bootcamp analyze industry data sets in order to build industry-ready portfolio projects that will impress hiring managers after graduation.

Want to get your hands on some real-world data sets right now? Kick off your bootcamp prep with this list of hot-button data sets curated to help you hone different data science skills.

 

Clean Your Data

 
 
Data cleaning involves correcting, removing, or reformatting corrupted, incomplete, or duplicate data within a data set. You can practice data cleaning with the following data sets:

 

72 Hours of #Gamergate Tweets

 
This data set is an aggregate of tweets tagged #Gamergate, all published during the same three-day stretch of the eponymous 2014 Twitter controversy. Practice keeping an eye out for irrelevant data—you’ll want to delete data that doesn’t apply to the theoretical problem you’re using the data set to solve. Remember, high quality data produces accurate results.

 

Weather Trends in Southeast Brazil

 
Sourced from Brazil’s National Institute of Meteorology, this data set is a compilation of hourly weather data captured from 122 weather stations in the country’s southeast region, which includes the coastal states of Rio de Janeiro and São Paulo as well as inland states like Minas Gerais. When inspecting the data set, try your hand at data profiling. The compilation includes data from 2000 to 2016, but not all weather stations were in operation during that time—so use this opportunity to weed out and evaluate the scope of missing values, which can skew results.

 

Analyze Your Data

 
 
Data analysis identifies meaningful patterns and correlations within data sets. You can practice data analysis with the following data sets:

 

Seven Generations of Pokemon Stats

 
With a variety of detailed stats on 802 different Pokemon spanning multiple generations of the game, this data set really did catch ‘em all. Using information like height, weight, abilities, experience points, and more, you can answer questions about which types of Pokemon are strongest and which are most likely to be classified as legendary Pokemon.

 

Harry Potter’s Wizarding World

 
Wondering what the most common type of wand is, or which spell Harry Potter uses most frequently? Find out with this data set, which collects text data from the Harry Potter films and pottermore.com. Use this data set to explore more Harry Potter data via sentiment analysis and natural language processing. If you’re interested in Harry Potter fan fiction, look for trends in this data set containing over 100,000 Harry Potter fan fiction titles and synopses.

 

Explore Machine Learning

 
 
Machine learning is an important data science technique that uses data to create predictive algorithms. You can strengthen your machine learning skills with the following data sets:

 

What Makes TikTok Tick

 
Wondering what it takes to go viral on one of today’s fastest-growing social media platforms? Investigate TikTok’s algorithm and examine how user interactions affect the app’s video recommendations by exploring scraped TikTok video comments, TikTok usage stats, or social media trends on Douyin, China’s TikTok sibling app. Curious about how TikTok compares to other OTT (over-the-top) apps like Snapchat and Instagram? Check out this profile of user behavior based on interactions with 56 OTT applications.

 

Data for Audiophiles

 
The Million Song Dataset is a compilation of audio features and metadata derived from one million contemporary pop tracks. The data was collected to foster research on scalable algorithms. Try your hand at predictive analytics and use this data to figure out what makes a song a chart-topper … or a flop.

 

Visualize Your Data

 
 
Data scientists use data visualizations to highlight patterns and trends in data. You can practice data visualization with these data sets:

 

2021 Fantasy Football Stats

 
Want to wow your fantasy football league? Build a dashboard to track KPIs and critical metrics derived from this year’s fantasy football data. Look for trends in the data set and consider which visualization method would communicate those patterns most effectively. A bar chart might be useful for comparing players’ NFL draft performances vs. expectations, while a scatter plot might effectively convey passing and rushing stats.

 

Winning Bachelor(ette) Contestants

 
See if you can predict the next winner of The Bachelor (or The Bachelorette) using this Bachelor(ette) data set compiled by FiveThirtyEight, which is handy for visualizing how one-on-one dates, first-impression roses, and other accolades affect a contestant’s chance of winning. If you’re looking to create your next bracket based on contestant attributes, this Bachelor data set tracks each contestant’s age, occupation, hometown, and more. Try using charts to convey common traits that Bachelor and Bachelorette winners share.

 

Ready to launch your data science career?

 
 
If you want to pivot into a data science role, the UNF Division of Continuing Education Data Science Bootcamp can help. This 100% online, self-paced data science program pairs each student with an industry expert mentor to help you build skills faster and refine your career strategy.

As a University of North Florida Data Science Bootcamp student, you’ll develop hands-on experience through the completion of 45+ small projects. You’ll also create two capstone projects that you can show to future employers as part of your professional data science portfolio.

Take the first step towards your data science career and explore The University of North Florida Division of Continuing Education’s Data Science Bootcamp today.


Sign Up

By subscribing you accept KDnuggets Privacy Policy