Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Semantics and Pragmatics

Algorithms for text analytics must model how language works to incorporate meaning in language—and so do the people deploying these algorithms. Bender & Lascarides 2019 is an accessible overview of what the field of linguistics can teach NLP about how meaning is encoded in human languages.



By Emily M. Bender, Professor of Linguistics at the University of Washington.

Natural language processing (NLP), including text analytics, text as data, etc., involves the application of machine learning and other methods to text (and speech) in some natural language. For the most part, data scientists working with NLP techniques are interested in the information that is stored in written English (or, more rarely, it seems, other languages). However, to get at this information requires building or at least using algorithms that model the structures of language and their relationship to the meanings expressed.

In 2013, I published Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax, which was designed to provide an accessible overview of what the field of linguistics can teach NLP about linguistic structure (morphology and syntax). This book was reviewed by Francis Tyers in Machine Translation in 2014 and by Chris Dyer in Computational Linguistics in 2015.

But structure is only part of the equation. In 2019, I teamed up with Alex Lascarides to produce a companion volume, Linguistic Fundamentals for Natural Language Processing IILinguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics, which similarly provides an overview of how meaning is encoded in human languages and how people use those encoded meanings for communicative ends. Both books follow a format of 100 short vignettes, illustrated with specific examples from many different languages, with the goal of making complex ideas approachable.

Bender Syntax Tree

The table of contents (made up of the headlines of all 100 vignettes) and the first two chapters (“Introduction” and “What is Meaning?”) can be found here. Or, if you want it in even shorter form, here are tweet-thread serializations of a few of the vignettes, including “#3: Natural language understanding requires commonsense reasoning”, “#8 Linguistic meaning includes emotional content”, “#19: Regular polysemy describes productively related sense families”, and “#30: Words can have surprising nonce uses through meaning transfer”.

Bender Sandy Paris

Curious what a nonce use is? #30 is where we cover how it is that sentences like The ham sandwich and salad at Table 7 is getting impatient are even ever meaningful. Other vignettes in the book include “#39 Collocations are often less ambiguous than the words taken in isolation”, “#62 Evidentials encode the source a speaker credits the information to and/or the degree of certainty the speaker feels about it”, and “#95 Silence can be a meaningful act”.

Bender Dogs Carried

Bio: Emily M. Bender (@emilymbender) is a Professor of Linguistics at the University of Washington and the Faculty Director of the Professional Masters in Computational Linguistics (CLMS) program. Her research interests include the interaction of linguistics and NLP, computational semantics, multilingual NLP, and the societal impact of language technology.

Related: