How to Build a Knowledge Graph with Neo4J and Transformers

Learn to use custom Named Entity Recognition and Relation Extraction models.



By Walid Amamou, Founder of UBIAI



Image by Author: Knowledge Graph in Neo4J

 

Introduction

 
 
In my previous article “Building a Knowledge Graph for Job Search using BERT Transformer”, we explored how to create a knowledge graph from job descriptions using entities and relations extracted by a custom transformer model. While we were able to get great visuals of our nodes and relations using Python library networkX, the actual graph lived in Python memory and wasn’t stored in database. This can be problematic when trying to create a scalable applications where you have to store an ever growing knowledge graph. This is where Neo4j excels, it enables you to store the graph in a fully functional database that will allow you to manage large amount of data. In addition, Neo4j’s Cypher language is rich, easy to use and very intuitive.

In this article, I will show how to build a knowledge graph from job descriptions using fine-tuned transformer-based Named Entity Recognition (NER) and spacy’s relation extraction models. The method described here can be used in any different field such as biomedical, finance, healthcare, etc.

Below are the steps we are going to take:

  • Load our fine-tuned transformer NER and spacy relation extraction model in google colab
  • Create a Neo4j Sandbox and add our entities and relations
  • Query our graph to find the highest job match to a target resume, find the three most popular skills and highest skills co-occurrence

For more information on how to generate training data using UBIAI and fine-tuning the NER and relation extraction model, checkout the articles below:

At the end of this tutorial, we will be able to create a knowledge graph as shown below.



Image by Author: Knowledge Graph of Jobs in Neo4J

 

Named Entity and Relation Extraction

 
 

  • First we load the dependencies for NER and relation models as well as the NER model itself that has been previously fine-tuned to extract skills, diploma, diploma major and years of experience:

  • Load the jobs dataset from which we want to extract the entities and relations:

  • Extract entities from the jobs dataset:

We can take a look at some of the extracted entities before we feed them to our relation extraction model:

{('10+ years', 'EXPERIENCE'),  ('2+ years', 'EXPERIENCE'),  ('3+ years', 'EXPERIENCE'),  ('3D computer vision algorithms', 'SKILLS'),  ('4+ years', 'EXPERIENCE'),  ('5+ years', 'EXPERIENCE'),  ('6+ years', 'EXPERIENCE'),  ('7+ years', 'EXPERIENCE'),  ('8+ years', 'EXPERIENCE'),  ('ABR', 'SKILLS'),  ('Agile BOM management', 'SKILLS'),  ('Analytical experience', 'SKILLS'),  ('Asia based suppliers', 'SKILLS'),  ('B.S.', 'DIPLOMA'),  ('BA/BS', 'DIPLOMA'),  ('BS', 'DIPLOMA'),  ('BS or MS', 'DIPLOMA'),  ('BS/MS', 'DIPLOMA'),  ('Bachelors', 'DIPLOMA'),  ('C', 'SKILLS'),  ('C++', 'SKILLS'),  ('C++11', 'SKILLS'),  ('CI/CD', 'SKILLS'),  ('CM suppliers', 'SKILLS'),  ('CVPR', 'SKILLS'),  ('Calibration', 'SKILLS'),  ('Communication', 'SKILLS'),  ('Communications', 'SKILLS'),  ('Computer Engineering', 'DIPLOMA_MAJOR')

 

We are now ready to predict the relations; first load the relation extraction model, make sure to change directory to rel_component/scripts to access all the necessary scripts for the relation model.

cd rel_component/

 

Predicted relations:  entities: ('5+ years', 'software engineering') --> predicted relation: {'DEGREE_IN': 9.5471655e-08, 'EXPERIENCE_IN': 0.9967771}  entities: ('5+ years', 'technical management') --> predicted relation: {'DEGREE_IN': 1.1285037e-07, 'EXPERIENCE_IN': 0.9961034}  entities: ('5+ years', 'designing') --> predicted relation: {'DEGREE_IN': 1.3603304e-08, 'EXPERIENCE_IN': 0.9989103}  entities: ('4+ years', 'performance management') --> predicted relation: {'DEGREE_IN': 6.748373e-08, 'EXPERIENCE_IN': 0.92884386}

 

 

Neo4J

 
 
We are now ready to load our jobs dataset and extracted data into the neo4j database.

  • Next, we add the documents, entities and relations to the knowledge graph. Note that we need to extracted the integer number of years from the name of entity EXPERIENCE and stored it as a property.

Now begins the fun part. We are ready to launch the knowledge graph and run queries. Let’s run a query to find the best job match to a target profile:

Results in tabular form showing the common entities:



And in graph visual:



Image by Author: Best job Match in Graph Visual

 

While this dataset was composed of only 29 job descriptions, the method described here can be applied to large scale dataset with thousands of jobs. With only few lines of codes, we can extract the highest job match to a target profile instantaneously.

Let’s find out the most in demand skills:

query = """MATCH (s:SKILLS)<-[:MENTIONS]-(o:Offer)RETURN s.name as skill, count(o) as freqORDER BY freq DESCLIMIT 10"""res = neo4j_query(query)res

 



And skills that require that highest years of experience:

query = """MATCH (s:SKILLS)--(r:Relation)--(e:EXPERIENCE) where r.type = "EXPERIENCE_IN"return s.name as skill,e.years as yearsORDER BY years DESCLIMIT 10"""res = neo4j_query(query)res

 



Network programming and experience in TCP/IP requires the highest years of experience followed by C++.

Finally, lets check with pair of skills co-occur the most:

neo4j_query("""MATCH (s1:SKILLS)<-[:MENTIONS]-(:Offer)-[:MENTIONS]->(s2:SKILLS)WHERE id(s1) < id(s2)RETURN s1.name as skill1, s2.name as skill2, count(*) as cooccurrenceORDER BY cooccurrenceDESC LIMIT 5""")

 



 

Conclusion:

 
 
In this post, we described how to leverage transformers based NER and spacy’s relation extraction models to create knowledge graph with Neo4j. In addition to information extraction, the graph topology can be used as an input to another machine learning model.

Combining NLP with Neo4j’s graph DB is a going to accelerate information discovery in many domains, with more notable applications in healthcare and biomedical.

If you have any questions or want to create custom models for your specific case, leave a note below or send us an email at admin@ubiai.tools.

Follow us on Twitter @UBIAI5

 
Bio: Walid Amamou is the Founder of UBIAI, an annotation tool for NLP applications, and holds a PhD in Physics.

Original. Reposted with permission.

Related: