Harnessing Semiotics and Discourse Communities to Understand User Intent
Semiotics helps us understand the importance of context to determining the meaning of a term and discourse communities provide us with the background context (mental model) by which to correctly interpret its meaning correctly.
By Dr Vladimir Dobrynin, Dr Xiwu Han, Mr Alexey Mishenin, Dr David Patterson, Dr Niall Rooney, Mr Julian Serdyuk, Aiqudo
In our previous article we set out the rationale for basing the Natural Language Understanding component of our digital assistant on the linguistic principles of semiotics and Discourse communities. Semiotics helps us understand the importance of context to determining the meaning of a term and discourse communities provide us with the background context (mental model) by which to correctly interpret its meaning correctly.
But how do we discover the discourses that exist for our smart phone application? In order to do this we focus on identifying the “jargon” terms that exist for each community. These are terms that are unique to that community that help in effective communication. It could be specific medical terms in a health community or specific educational terms in a teaching community for example. The emphasis is on “specific” as this is important. Jargon terms are not broad in meaning such as the term “Computer” is, as this can mean a number of different things (such as laptop, PC, Supercomputer etc). What we want to identify are terms such as “Threadripper” (a gaming processor from AMD) would be a as this is very specific in meaning and is used in fewer contexts.
Jargon terms and Entropy
So – how do we identify good jargon terms and what do we do with them in order to understand user commands?
To do this we use entropy. In general, entropy is a measure of chaos or disorder and, in an information theory context, it can be used to determine how much information is conveyed by a term. Because jargon words have a very narrow and specific meaning within specific discourse communities, they have lower entropy (more information value) than broader more general terms.
To determine term entropy we first need a relevant body of content (corpus) to analyse. We create this ourselves from content scraped from the web and other repositories. More information on this process can be found here. We then take each term in our corpus and build a probability profile of co-occurring terms. The diagram below shows an example (partial) probability distribution for the term ‘computer’.
These co-occurring term probabilities can be thought of as the context for each potential jargon word. We then use this probability profile to determine the entropy of the word. If that entropy is low then we consider it to be a candidate jargon word.
Having identified the low entropy jargon words, we then use their probability distributions as attractors for documents within the corpus. In this way (as seen in the diagram below) we create a set of document clusters where each cluster relates semantically to a jargon term. (Note: in the interest of clarity, clusters are labelled here using human labelled phrases as opposed to the jargon words themselves as they convey the meaning of the cluster in a more understandable way).
We then build a graph within each cluster that connects documents based on how similar they are in terms of meaning. We identify ‘neighbourhoods’ within these graphs that relate to areas of intense similarity. For example, a cluster may be about “cardiovascular fitness” whereas a neighbourhood may be more specifically about “High Intensity Training”, or “rowing” or “cycling”, etc.
These neighbourhoods can be thought of as subtopics within the overall cluster topic. Within each sub-topic we can then extract important meaning-based phrases that precisely describe what that sub-topic is about. e.g. “HIIT”, “anaerobic high-intensity period”, “cardio session”, etc.
In this way we create meaning-based structure from completely unstructured content. Documents from the same cluster relate to the same discourse community. Documents from the same cluster that share similar important terms or phrases can be regarded as relating to the same sub-topic. If two clusters share a large number of important phrases then this represents a dialog between two discourse communities. If multiple important phrases are shared among many clusters then this represents a dialogue among multiple communities.
So having described a little bit about the algorithms themselves, how do they help us understand the correct meaning behind a user’s command? Given this contextual partitioning of the data into discourses based on jargon terms, we can disambiguate among the many different meanings a term can have. For example, if the user were to say ‘open the window’ – we will be able to understand that there is a meaning (discourse) relating to both buildings and to software but if the user were to say ‘minimize the window’, we would understand that this could only have a software meaning and context. Fully understanding the nuances behind a user’s command is, of course, much more complicated than what I have just described and we utilise a number of AI/ML approaches within our platform (including deep learning in places) but the goal here is to give a high level overview of the technology at the core of our approach.
- There is No Such Thing as a Free Lunch
- Building an intelligent Digital Assistant
- Domain-Specific Language Processing Mines Value From Unstructured Data