Silver Blog7 Types of Artificial Neural Networks for Natural Language Processing

What is an artificial neural network? How does it work? What types of artificial neural networks exist? How are different types of artificial neural networks used in natural language processing? We will discuss all these questions in the following article.

3. Recursive neural network (RNN)


A simple recursive neural network architecture (

A recursive neural network (RNN) is a type of deep neural network formed by applying the same set of weights recursively over a structure to make a structured prediction over variable-size input structures, or a scalar prediction on it, by traversing a given structure in topological order [6]. In the simplest architecture, a nonlinearity such as tanh, and a weight matrix that is shared across the whole network are used to combine nodes into parents.


4. Recurrent neural network (RNN)

A recurrent neural network (RNN), unlike a feedforward neural network, is a variant of a recursive artificial neural network in which connections between neurons make a directed cycle. It means that output depends not only on the present inputs but also on the previous step’s neuron state. This memory lets users solve NLP problems like connected handwriting recognition or speech recognition. In a paper, Natural Language Generation, Paraphrasing and Summarization of User Reviews with Recurrent Neural Networks, authors demonstrate a recurrent neural network (RNN) model that can generate novel sentences and document summaries [7].

Siwei Lai, Liheng Xu, Kang Liu, and Jun Zhao created a recurrent convolutional neural network for text classification without human-designed features and described it in Recurrent Convolutional Neural Networks for Text Classification. Their model was compared to existing text classification methods like Bag of Words, Bigrams + LR, SVM, LDA, Tree Kernels, Recursive neural network, and CNN. It was shown that their model outperforms traditional methods for all used data sets [8].


5. Long short-term memory (LSTM)


A peephole LSTM block with input, output, and forget gates. (

Long Short-Term Memory (LSTM) is a specific recurrent neural network (RNN) architecture that was designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs [9]. LSTM does not use activation function within its recurrent components, the stored values are not modified, and the gradient does not tend to vanish during training. Usually, LSTM units are implemented in “blocks” with several units. These blocks have three or four “gates” (for example, input gate, forget gate, output gate) that control information flow drawing on the logistic function.

In Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling, Hasim Sak, Andrew Senior, and Françoise Beaufays showed that the deep LSTM RNN architectures achieve state-of-the-art performance for large scale acoustic modeling.

In the work, Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network by Peilu Wang, Yao Qian, Frank K. Soong, Lei He, and Hai Zhao, a model for part-of-speech (POS) tagging was presented [10]. The model achieved a performance of 97.40% tagging accuracy. Apple, Amazon, Google, Microsoft and other companies incorporated LSTM as a fundamental element into their products.


6. Sequence-to-sequence models

Usually, a sequence-to-sequence model consists of two recurrent neural networks: an encoder that processes the input and a decoder that produces the output. Encoder and decoder can use the same or different sets of parameters.

Sequence-to-Sequence models are mainly used in question answering systems, chatbots, and machine translation. Such multi-layer cells have been successfully used in sequence-to-sequence models for translation in Sequence to Sequence Learning with Neural Networks study [11].

In Paraphrase Detection Using Recursive Autoencoder, a novel recursive autoencoder architecture is presented. The representations are vectors in an n-dimensional semantic space where phrases with similar meanings are close to each other [12].


7. Shallow neural networks

Besides deep neural networks, shallow models are also popular and useful tools. For example, word2vec is a group of shallow two-layer models that are used for producing word embeddings. Presented in Efficient Estimation of Word Representations in Vector Space, word2vec takes a large corpus of text as its input and produces a vector space [13]. Every word in the corpus obtains the corresponding vector in this space. The distinctive feature is that words from common contexts in the corpus are located close to one another in the vector space.



In this paper, we described different variants of artificial neural networks, such as deep multilayer perceptron (MLP), convolutional neural network (CNN), recursive neural network (RNN), recurrent neural network (RNN), long short-term memory (LSTM), sequence-to-sequence model, and shallow neural networks including word2vec for word embeddings. We showed how these networks function and how different types of them are used in natural language processing tasks. We demonstrated that convolutional neural networks are primarily utilized for text classification tasks while recurrent neural networks are commonly used for natural language generation or machine translation. In the next part of this series, we will study existing tools and libraries for the discussed neural network types.



Data Monsters helps corporations and funded startups research, design, and develop real-time intelligent software to improve their business with data technologies.

Original. Reposted with permission.