A Brief Introduction to Wikidata

Like Wikipedia, there are all kinds of data stored in Wikidata. As such, when you are looking for a specific dataset or if you want to answer a curious question, it can be a good start looking for that data at Wikidata first.



By Björn Hartmann, Economist & Analyst

Image

Have you ever heard about Wikidata? If not, you might think of Wikipediafirst — and that is not wrong. Wikidata is also a project of the Wikimedia Foundation. In particular:

“Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia (…).”

Loosely, you could describe Wikidata as Wikipedias database with over 46million data items (April 2018).

And in line with Wikimedia’s mission, everyone can add and edit data, and use it for free.

Image
Pictures of Nobel Prize winners, Map with countries that use 112 as an emergency number, and EU member states

 

Available data

 
Like Wikipedia, there are all kinds of data stored in Wikidata. As such, when you are looking for a specific dataset or if you want to answer a curious question, it can be a good start looking for that data at Wikidata first.

Example questions:

  1. What is the capital city of every member of the European Union and how many inhabitants live there?
  2. How do the Nobel Prize winners in Physics look like?
  3. Which countries use 112 as an emergency number?

(To see the answer, scroll down)

 

Advantages and Disadvantages of Wikidata

 
There are some aspects you should keep in mind when using Wikidata. Whether they are an advantage or disadvantage, however, depends on you:

  • a free and open knowledge base that can be read and edited by both humans and machines
  • contains various data types (e.g. text, images, quantities, coordinates, geographic shapes, dates)
  • uses SPARQL

Especially the last aspect allows you very interesting questions like to ones above. If you have never used SPARQL before, however, it might be a struggle in the beginning. But don’t worry. The next section gives you a brief introduction.

 

Idea and Concept of SPARQL

 
SPARQL is a query language for RDF databases. In contrast to relational databases like SQL, items are not part of any tables. Instead, items are linked with each other like a graph or network:


Example how to visualize an RDF database

To describe these relations, we can use a triple:

A triple is a statement containing a subject predicate and object.

Examples:

  • Germany (subject) has the capital (predicate) Berlin (object).
  • Berlin (subject) has the coordinates (predicate) 3.5million (object).
  • The European Union (subject) has the member (predicate) Germany (object).
  • Germany (subject) is a member of (predicate) the European Union (object).

You can come up with various statements to describe the graph above. And that is a huge benefit of SPARQL. You are not limited to a certain structure of relational databases and new information can be easily added.

(If you want to dive deeper into the concept of SPARQL, I recommend this Youtube video (11min)).

 

How to query data from Wikidata?

 
To get data from Wikidata you simply use triples (like to one above) to write a SPARQL query. Let’s have a look how such a SPARQL query might look like. Note, that we are using specific identifiers to define the right relationship and item:

SELECT ?country
WHERE 
{
  ?country   wdt:P463     wd:Q458.
  #country   #member of   #European Union
}


Here, we simply ask for the countries that are part of the European Union.

Do you recognize the subject-predicate-object statement? We just select those countries, for which the condition holds: the country ( ?country ) is a member of (wdt:P463) the European Union (wd:Q458).

Using the Wikidata Query Service as an endpoint gives us the following result:


Selecting all EU member countries

Now, we only get the identifier codes of the member states back. To see the country names, we just use a label service and add it to our query:

SELECT ?country ?countryLabel
WHERE 
{
  ?country   wdt:P463          wd:Q458.
  SERVICE wikibase:label { bd:serviceParam wikibase:language
  "[AUTO_LANGUAGE],en". }
}



Adding the Label Service shows us the country names

How simple is this? If you like to try it on your own, just follow this link.

 

How to get the correct identifiers?

 
For all queries, it is essential to identify the correct items and relations. For this purpose, Wikidata uses specific identifiers.

In the example above, I already looked them up: The relation “Being a member of” has the identifier wdt:P463 and the item “European Union” is identified by wd:Q458 .

But how would you get them?

What I recommend is to inspect the Wikidata site of a result item. Knowing that France is a member of the European Union, I would inspect its Wikidata item:

1. Open France in Wikipedia to get to its Wikidata item:


https://en.wikipedia.org/wiki/France

2. Inspect the Wikidata item:


https://www.wikidata.org/wiki/Q142

Here, you simply hover over the relationship “member of” and item “European Union” to get their identifier codes.

 

Solutions: (and more examples)

 
Do you remember the questions in the introduction? These are the queries you could use to answer them:

1. What is the capital city of every member of the European Union and how many inhabitants live there?

SELECT ?country ?countryLabel ?capitalLabel ?population 
WHERE 
{
  ?country wdt:P463 wd:Q458.
  ?country wdt:P36 ?capital.
  ?capital wdt:P1082 ?population.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "   [AUTO_LANGUAGE],en". }
}



List of EU members with the population of their capital cities

2. How do the Nobel Prize winners in Physics look like?

#defaultView:ImageGrid
SELECT ?person ?personLabel ?image
WHERE 
{
  ?person wdt:P18 ?image;
          wdt:P166 wd:Q38104.
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}



Selected pictures of Nobel Prize winners in Physics (in total there are 198)

3. Which countries use 112 as an emergency number?

#defaultView:Map
SELECT ?country ?countryLabel ?location
WHERE {
 ?country wdt:P2852 wd:Q1061257;
           wdt:P625 ?location.
  
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}



Map of countries that use 112 as an emergency number

 

Interested in more?

 
I’m working on an online course about Wikidata. So if you are interested in more, leave your mailaddress and recevie a 25% coupon once the course starts ????

https://bjrn26.typeform.com/to/w7TM6R

 
Bio: Björn Hartmann is an economist and analyst. He writes about various topics in Data Analytics and Data Science and is currently working on a Wikidata Online Course.

Original. Reposted with permission.

Related: