Populating a GRAKN.AI Knowledge Graph with the World
This updated article describes how to move SQL data into a GRAKN.AI knowledge graph.
By Jo Stichbury, Grakn Labs.
Image by davecito is licensed under CC BY 2.0
Highly interconnected data from complex domains is challenging to model, maintain and query. With GRAKN.AI, which is an open source, distributed knowledge base (think a graph database with extra punch), building intelligent systems becomes dramatically easier.
If you are thinking of trying GRAKN.AI, the chances are that you already have some data and want to see what our stack can do. This post shows how to move data from a SQL database to GRAKN.AI, and how to use our visualiser to explore the data.
I’m going to use a simple, well-known, example set of data about cities and countries of the world. In the course of this post, I will go through the basics of how to set up a SQL database and make some simple queries (which you can omit if you’re already a SQL user), then I will explain how to migrate the contents of the database into GRAKN.AI. I’ll use our declarative query language, Graql, to make the same queries as shown in SQL. You can find the code for this example on Github.
I have used MySQL for this example, although the Grakn Labs team have also tested with Oracle and PostgresQL. The version of GRAKN.AI I used was 0.12.1 but, as ever, I recommend that you use the latest version.
Setting up MySQL
If you already have a MySQL setup, you may want to skip ahead to “Hello World” section.
Getting MySQL installed is relatively straightforward. I followed these instructions for MacOS X (downloading the installation package here). It wasn’t totally clear how to start the MySQL Server (I eventually worked out that installation puts an icon in your System Preferences menu). Having started the server, you need to start the MySQL shell from the terminal
On first use, change the temporary password it was installed with (I set my new password to root, which is the same as the username, so it was memorable).
Once I had MySQL set up, it was time to open the example world database from the command line, as described here.
It is a classic example dataset published by MySQL that is easily understood and is commonly used by beginners to introduce them to SQL queries (however, I cannot guarantee that the data it contains is accurate).
Using the following query to retrieve information about the columns of the city table:
As you can see, there are 5 columns in the table (confusingly shown as rows in the output): ID, Name, CountryCode, District and Population. The ID is the Primary Key for the table, which uniquely identifies each record. This will be important later, when we migrate to Grakn.
The following statement gives us a peek at the first 10 items of city data:
Let’s get the information about a specific city: Sydney, Australia
You can experiment in a similar way with the country and countrylanguage tables.
Migrating to GRAKN.AI
OK, you should have a reasonable handle on the data, so now let’s migrate it into GRAKN.AI. The first thing, if you’ve not done it already, is to follow the quickstart guide to download GRAKN.AI and start the Grakn engine.
There are limitations on the SQL format that prevent it from expressing the semantics of the data. By “semantic”, I mean that the meaning of the data cannot easily be encoded alongside the data itself. In contrast, a knowledge graph is self-descriptive, or, simply put, it provides a single place to find the data and understand what it’s all about.To have the full benefit of a knowledge graph, we must write the ontology for the dataset.
Writing an ontology for every field in the SQL tables will be too unwieldy so we will take just some of the data contained in the table, as follows:
We define 3 entities to represent
language, and some relations between them (between language and country and country and city).
In the terminal, load the ontology as follows:
Here’s a visual representation of the ontology:
Once we have written and loaded the ontology for the dataset, we need to use the Graql templating language to instruct the SQL migrator on how the SQL data is mapped to the above ontology. The SQL migrator applies the templates we provide to the results of an SQL query, to each row of results in turn, replacing the indicated sections in the template with the corresponding data. The column header is the key, while the content of each row at that column is the value.
To migrate the country data, the template code is as follows:
The language migration template:
Then, to insert a relation between the language and the countries in which it is spoken, there is a
match-insert query, which matches a language and country, then builds a relation between them:56
For city migration, the template is as follows:
To determine if it is the capital city:
There are two ways in which you can migrate SQL data into GRAKN.AI. You can use Java to perform the migration in a few lines of code, which is described further in our SQL migration example.
Alternatively, there is a shell script that you can call to apply the templates above to the SQL data. In effect, the shell script calls a set of Java functions so you don’t have to. I like this option, as Java is not my natural habitat. The migration documentation shows the script options, which are as follows:
Whether you use the shell script or Java code, what you are doing in either case is extracting SQL data using the JDBC API and importing it into a graph.
For this example, running from within the
examples/example-sql-migration/shell-migration directory of the GRAKN.AI installation, I called Grakn’s
migration.sh script on each of the templates shown above, passing in the appropriate query. For example, for the countries migration:
However, to make it simpler, you can chain them all together in a batch file, which I’ve done — you can find it on Github — and simply call that:
Querying the Graph
We can now make queries on the graph as follows. Let’s reproduce some of the queries we made previously in SQL.
If successful: you should see a list of 10 countries following the query.
Similarly, query for information about the city of Sydney:
The Grakn team is super helpful in sorting out when things went wrong and, if you have any problems, please get in touch for help too. Just make contact via our Community page, or leave a comment below or on our Slack channel.
One issue that I hit on initially is that you need to make sure that you download the JDBC driver from here and place the .jar file (mysql-connector-java-5.1.40-bin.jar) in the
/lib directory of the Grakn environment that you downloaded and set up.
Visualising the Data
The Grakn visualiser is a cool way to look at the resulting graph and explore the data. With the Grakn engine running and the graph loaded, in your browser, navigate to http://localhost:4567/ which will allow you to make queries on the graph. It’s a nice way of seeing how data is connected, which fits much better — in my view — with how I think about cities and countries, than a table with rows and columns. Let’s explore …
We can show 10 countries and their cities:
In the GRAKN.AI visualiser:
Further guidance on using the visualiser, which is rapidly evolving, can be found in the GRAKN.AI documentation.
You may be wondering why I’ve bothered moving the data from a relational database into a Grakn graph. After all, isn’t it fine as it is? Well yes, and no. Although relational databases have benefits that include simplicity and familiarity, they also have limitations. If you are just doing basic read/writes on straightforward data, SQL may well be adequate for your needs. But, remember that I chose a simple, familiar example specifically to make this article easy to follow. If you have a more complex domain with highly interconnected data, which is very probable in today’s information landscape, you will quickly see significant benefits, since describing the relationships within data is the primary characteristic of a graph database. In this aspect, a relational database cannot begin to provide the equivalent speed or flexibility as a graph.
As a knowledge base, GRAKN.AI has an additional benefit over standard graph databases, since it allows complex data modeling, verification, scaling, querying and analysis. A key step is the definition of an ontology, which facilitates the modeling of complex datasets and guarantees information consistency. Inference rules allow the extraction of implicit information from explicit data, to achieve logical reasoning over the represented knowledge.
At the beginning of this article, I introduced GRAKN.AI as a graph database with extra punch. I have hardly scratched the surface of what it can do, but I hope I have at least shown that it is easy to set up a Grakn graph with familiar data, and how to query and visualise it.
There are a number of blog posts on blog.grakn.ai that will give you a flavour of what GRAKN.AI can do, and an FAQ that possibly answers some of the questions you may have! If it does not, please ask away in the comments below!
Jo Stichbury is a Technical Author at Grakn Labs. Former Head of Technical Publications / Communications at Nokia and Symbian. PhD from the University of Cambridge. Interests include data science & machine learning, cats, cakes, driverless cars & Manchester City.
GRAKN.AI is an open source distributed knowledge graph platform to power the next generation of intelligent applications. Using the power of machine reasoning, we provided a platform to help manage and make sense of highly interconnected big data. Grakn performs machine reasoning through Graql, a graph query language capable of reasoning and graph analytics.
Original. Reposted with permission.