Google BigQuery Public Datasets
Google BigQuery is not only a fantastic tool to analyze data, but it also has a repository of public data, including GDELT world events database, NYC Taxi rides, GitHub archive, Reddit top posts, and more.
By Gregory Piatetsky,
@kdnuggets, Feb 20, 2015.
Google software engineer Felipe Hoffa recently posted a Quora answer highlighting open datasets on Google BigQuery - once data is loaded there, you can make it public, let others analyze with SQL.
Here are some notable
Datasets publicly available on Google BigQuery (via reddit).
GDELT Worldwide news
Here is a very good video made by Felipe Hoffa where GDELT project leader Kalev Leetaru gives a very good example of analyzing events in Ukraine relative to the world, and Amanda Traud, a Data Scientist at L-3 Data Tactics, who uses R and Shiny to explore GDELT.
Other datasets include:
Wikipedia Pageviews August 2014
GitHub Archive
HttpArchive
Freebase
New York Taxi
Reddit Top posts: bigquery.cloud.google.com/table/bigquery-samples:reddit.full
GeoIP Geolocation:
Related:
Google software engineer Felipe Hoffa recently posted a Quora answer highlighting open datasets on Google BigQuery - once data is loaded there, you can make it public, let others analyze with SQL.
Here are some notable
GDELT Worldwide news
- GDELT announcement
- Top words queries
- All events: bigquery.cloud.google.com/table/gdelt-bq:full.events
Here is a very good video made by Felipe Hoffa where GDELT project leader Kalev Leetaru gives a very good example of analyzing events in Ukraine relative to the world, and Amanda Traud, a Data Scientist at L-3 Data Tactics, who uses R and Shiny to explore GDELT.
Other datasets include:
Wikipedia Pageviews August 2014
GitHub Archive
- www.githubarchive.org/
- Full timeline: https://bigquery.cloud.google.com/table/githubarchive:github.timeline
HttpArchive
Freebase
- 2014 Jan 19 triples: https://bigquery.cloud.google.com/table/fh-bigquery:freebase20140119.triples
New York Taxi
- Taxi queries
- 173 million taxi trips: https://bigquery.cloud.google.com/table/833682135931:nyctaxi.trip_data
Reddit Top posts: bigquery.cloud.google.com/table/bigquery-samples:reddit.full
GeoIP Geolocation:
Related: