Building a Flask API to Automatically Extract Named Entities Using SpaCy

This article discusses how to use the Named Entity Recognition module in spaCy to identify people, organizations, or locations in text, then deploy a Python API with Flask.



By Susan Li, Sr. Data Scientist

Figure

Photo credit: Pixabay

The overwhelming amount of unstructured text data available today provides a rich source of information if the data can be structured. Named-entity Recognition (NER)(also known as Named-entity Extraction) is one of the first steps to build knowledge from semi-structured and unstructured text sources.

Only after NER, we will be able to reveal at a minimum, who, and what, the information contains. As a result, a data science team would be able to see a structured representation of all of the the names of people, companies, locations and so on in a corpus that could serve as a point of departure for further analysis and investigation.

In the previous post, we have learned and practiced how to build named entity recognizer using NLTK and spaCy. To take steps further, create something useful, this article will cover how to develop and deploy a simple named entities extractor using spaCy and serve it with a Flask API in python.

 

A Flask API

 
Our goal is to build an API that we provide text, for example, a New York Times article (or any article) as input, our named entity extractor will then identify and extract four types of entities: organization, person, location and money. The basic architecture looks like this:

Figure

Figure 1

To build the API, we will need to create two files:

  1. index.html to handle the template of the API.
  2. app.py to handle the requests and return the output file.

And the final product will look like this:

Figure

Figure 2

Let’s start building the API and create two files step-by-step. Our project folder structure is like below:

  • Our project is located in Named-Entity-Extractor folder.
Figure

Figure 3

  • The templates directory is in the same folder as the app.py in which it is created.
Figure

Figure 4

  • The index.html is located in the templates folder.

 

index.html

 

  • We name our App as “Named Entity Extractor”
  • Use BootstrapCDN, copy-paste the stylesheet <link> into our <head>before all other stylesheets to load our CSS.
  • Get Bootstrap’s navigation header, the navbar from a template for a simple informational website. It includes a large callout called a jumbotron and three supporting pieces of content.
  • Copy-paste the navbar code from the template’s source code.
  • Bootstrap requires a container element to wrap site contents and house our grid system.
  • In our case, for the first container, we will create a vertical form with two input fields, one “Clear” button, and one “Submit” button.
  • Textual form controls are styled with the form-control class.
  • We are giving our users four taskoptions (a.k. a named entity extraction tasks) to choose from, they are: OrganizationPersonGeopolitical & Money.
  • The second container provides contextual feedback messages for our user’s action, that is the results of named entity extraction.
  • Not only we want to print out named entity extraction results to our user, we also want to print out the number of results for each of named entity extractions.
  • Copy-paste the JavaScript in the <script>s near the end of our html page, right before the closing </body> tag,

 

app.py

 
Our app.py file is rather simple and easy to understand. It contains the main code that will be executed by the Python interpreter to run the Flask web application, it includes the spaCy code for recognizing named entities.

  • We ran our app as a single module; thus we initialized a new Flask instance with the argument __name__ to let Flask know that it can find the HTML template folder (templates) in the same directory where it is located.
  • We use the route decorator (@app.route('/')) to specify the URL that should trigger the execution of the index function.
  • Our index function simply rendered the index.html HTML file, which is located in the templates folder.
  • Inside the process function, we apply nlp to the raw text user will enter, and extract pre-determined named entities (OrganizationPersonGeopolitical & Money) from the raw text.
  • We use the POST method to transport the form data to the server in the message body. Finally, by setting the debug=True argument inside the app.run method, we further activated Flask's debugger.
  • We use the run function to only run the application on the server when this script is directly executed by the Python interpreter, which we ensured using the if statement with __name__ == '__main__'.

We are almost there!

 

Try our API

 

  • Start the Command Prompt.
  • Navigate to our Named-Entity-Extractor folder.
Figure

Figure 5

  • Open your Web browser, copy-paste “http://127.0.0.1:5000/” into the address bar, and we will see this form:
Figure

Figure 6

  • I copy-pasted a few paragraphs of an article from nytimes, it is a Canadian story:
Figure

Figure 7

  • Select “Organization” under “Select task”, then click “Submit”, this is what we get:
Figure

Figure 8

  • Nice. Let’s try “Person” entity:
Figure

Figure 9

  • Geopolitical” entity:
Figure

Figure 10

  • Money” entity:
Figure

Figure 11

We are done!

If you followed the above steps and made it here, congratulations! You have created a simple but functioning named entity extractor at zero cost! When you look back, there were only two files we need to create. and all we need are open source libraries and learning how to use them to create these two files.

By building an app like this, you have learned new skills and using these skills to creates something useful.

The complete source code is available at this repository. Happy Monday!

Reference:

 
Bio: Susan Li is changing the world, one article at a time. She is a Sr. Data Scientist, located in Toronto, Canada.

Original. Reposted with permission.

Related: