KDnuggets Home » News » 2015 » Feb » Software » Google BigQuery Public Datasets ( 15:n06 )

Google BigQuery Public Datasets


Google BigQuery is not only a fantastic tool to analyze data, but it also has a repository of public data, including GDELT world events database, NYC Taxi rides, GitHub archive, Reddit top posts, and more.



By Gregory Piatetsky, @kdnuggets, Feb 20, 2015.

Google software engineer Felipe Hoffa recently posted a Quora answer highlighting open datasets on Google BigQuery - once data is loaded there, you can make it public, let others analyze with SQL.

Here are some notable Google Bigquery Datasets publicly available on Google BigQuery (via reddit).

GDELT Worldwide news
 


Here is a very good video made by Felipe Hoffa where GDELT project leader Kalev Leetaru gives a very good example of analyzing events in Ukraine relative to the world, and Amanda Traud, a Data Scientist at L-3 Data Tactics, who uses R and Shiny to explore GDELT.



Other datasets include:

Wikipedia Pageviews August 2014

GitHub Archive
 
HttpArchive
 
Freebase
 
New York Taxi
 
Reddit Top posts: bigquery.cloud.google.com/table/bigquery-samples:reddit.full

GeoIP Geolocation:

Related: