Research Papers for NLP Beginners

Read research papers on neural models, word embedding, language modeling, and attention & transformers.

Research Papers for NLP Beginners
Sincerely Media via Unsplash


If you’re new to the world of data and have a particular interest in NLP (Natural Language Processing), you’re probably looking for resources to help grasp a better understanding. 

You have probably come across so many different research papers and are sitting there confused about which one to choose. Because let’s face it, they’re not short and they do consume a lot of brain power. So it would be smart to choose the right one that will benefit your path to mastering NLP. 

I have done some research and have collected a few NLP research papers that have been highly recommended for newbies in the NLP area and overall NLP knowledge.

I will break it up into sections so you can go find exactly what you want.


Machine Learning and NLP

Text Classification from Labeled and Unlabeled Documents using EM by Kamal Nigam, 1999

This paper is about how you can improve the accuracy of learned text classifiers by augmenting a small number of labeled training documents with a large pool of unlabeled documents.

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList by Marco Tulio Ribeiro et al., 2020

In this paper, you will learn more about CheckList, a task-agnostic methodology for testing NLP models as unfortunately some of the most used current approaches overestimate the performance of NLP models.


Neural Models

Natural Language Processing (almost) from Scratch by Ronan Collobert, 2011

In this paper, you will go through the foundations of NLP - as it states in the title, it is ALMOST from scratch. Topics include Named Entity Recognition, Semantic role labeling, networks, training, and more. 

Understanding LSTM Networks by Christopher Olah, 2015

Neural Networks are a major part of NLP, therefore having a good understanding of it will benefit you in the long run. In this paper, there is a focus on LSTM networks which are widely used. 


Word/Sentence Representation and Embedding

Distributed Representations of Words and Phrases and their Compositionality by Tomas Mikolov, 2013

Written by Mikolov, who introduced the Skip-gram model for learning high-quality vector representations of words from large amounts of unstructured text data - this paper will present several extensions of the original Skip-gram model.

Distributed Representations of Sentences and Documents by Quoc Le and Tomas Mikolov, 2014

Going into more depth about the two major weaknesses of bag-of-words, the authors introduce Paragraph Vector - which is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of text, such as sentences.


Language Modelling

Language Models are Unsupervised Multitask Learners by Alec Radford, 2018

Natural language processing tasks are normally approached with supervised learning on task-specific datasets. However, Multitask learning is being tested as a promising framework for improving general performance in NLP. 

The Unreasonable Effectiveness of Recurrent Neural Networks by Andrej Karpathy, 2015

This paper goes back to the start of recurrent neural networks and why they are so effective and robust with code examples to give you a better understanding


Attention & Transformers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin et al., 2019

As you’re learning about machine learning, you have probably heard about BERT - Bidirectional Encoder Representations from Transformers. It is widely used and known for being able to pre-train deep bidirectional representations from unlabeled text. In this paper, you will further understand and learn how to improve your fine-tuning based on BERT.

Attention is All You Need by Ashish Vaswani et al., 2017

This paper focuses on the Transformer, solely on attention mechanisms which differ from models which are typically based on complex recurrent or convolutional neural networks. You will learn how Transformer generalizes well to other tasks and may be the better option.

HuggingFace's Transformers: State-of-the-art Natural Language Processing by Thomas Wolf et al., 2020

Want to learn more about Transformers which has become the dominant architecture for natural language processing? In this paper, you will learn more about its architecture and how it facilitates the distribution of pre-trained models.


Wrapping Up


Like I said above, I don’t want to overwhelm you with so many different research papers - therefore I have kept it at a minimal level. 

If you know of any that beginners may benefit from, please drop them in the comments so that they can see them. Thank you!

Nisha Arya is a Data Scientist and Freelance Technical Writer. She is particularly interested in providing Data Science career advice or tutorials and theory based knowledge around Data Science. She also wishes to explore the different ways Artificial Intelligence is/can benefit the longevity of human life. A keen learner, seeking to broaden her tech knowledge and writing skills, whilst helping guide others.