KDnuggets subscribers now have access to the WorldData.AI Partners Plan at no cost! Check out the world’s largest external curated data platform, integrating data from all leading global sources.
Data Repositories
-
Anacode Chinese Web Datastore: A collection of crawled Chinese news and blogs in JSON format
-
Appen Open Source Datasets: Over 270 audio, image, video and text datasets in over 80 languages
-
AssetMacro: Historical data of macroeconomic indicators and market data
-
Awesome Public Datasets: A topic-centric list of HQ open datasets
-
AWS Public Data Sets: A centralized repository of public data sets
-
BigML Public Data Sources: A long list of sources of data that anyone can use
-
USA.gov: APIs and data feeds to help people find useful government information
-
DataPortals.org: A Comprehensive List of Open Data Portals from Around the World
-
Data.gov.uk: Find data published by central government, local authorities and public bodies to help you build products and services
-
Data Planet: The largest repository of standardized and structured statistical data
-
DataSF.org: Search hundreds of datasets from the City and County of San Francisco
-
Data.world: Discover and share data, connect with interesting people, and work together to solve problems faster
-
Europeana Data: Open metadata on 20 million texts, images, videos and sounds gathered by Europeana
-
GEO Gene Expression Omnibus: A curated, online resource for gene expression data browsing, query and retrieval
-
HitCompanies Datasets: Comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning
-
ICWSM 2009 Data Challenge: 44 million blog posts made between August 1st and October 1st, 2008
-
JMP Public Featured Datasets: Assorted public datasets from JMP
-
Kaggle Datasets: Explore, analyze, and share quality data
-
Linking Open Data: Making data freely available to everyone
-
LoveTheSales: The world’s biggest online sales marketplace
-
Lyst Fashion Data Trends: The industry’s trusted source for tracking fashion data trends
-
Million Song Dataset: A freely-available collection of audio features and metadata for a million contemporary popular music tracks
-
NASDAQ Data Link: A premier source for financial, economic and alternative datasets
-
NASA Space Science Data Coordinated Archive: NASA's archive for space science mission data
-
Qlik Sense Data Sources: Connect and combine data from hundreds of data sources
-
Robert Schiller Data: Housing data, financial market data and more, from his book Irrational Exuberance
-
Sports Statistics: Data for soccer, NBA, NFL, NHL, and more
-
StatLib Datasets Archive: Datasets from Carnegie Mellon University
-
UCI Machine Learning Repository: A collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms ( new beta version)
-
-
UK Open Postcode Geo: We organise UK open data by location and signpost the source
- United States Census Bureau: An assortment of US Census data
-
Virtual Screening of Bioassay Data: Bioassay datasets available for download, by Amanda Schierz, J.
-
Web Data Commons: Structured data from the Common Crawl, the largest web corpus available to the public
-
WorldData.AI: Connect your data to many of 3.5 Billion WorldData datasets and improve your Data Science and Machine Learning models! Subscribe to KDnuggets to get free access to Partners plan
-
Yahoo Webscope Program: Reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists
-
Yelp Open Dataset: An all-purpose dataset for learning; subset of Yelp businesses, reviews, and user data for use in personal, educational, and academic purposes
|