KDnuggets Home » News » 2019 » Feb » Opinions » Natural Language Processing for Social Media ( 19:n07 )

Natural Language Processing for Social Media

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about Natural Language Processing and how it is used in social media analytics.

By Kevin Gray and Anna Farzindar

Anna Farzindar

Kevin Gray: What is Natural Language Processing (NLP)? Can you give us a non-technical definition of NLP and short history of how it came about?

Anna Farzindar: Natural Language Processing is a sub-field of Artificial Intelligence, perhaps the most famous one. It is the art of human communication with machines, including texts and conversational content. We refer to “natural language” because we aim to communicate with a computer or a smart device using human languages such as French or Korean, not with programming like Java or Python. Sometimes NLP is interchanged with “Computational Linguistics” when the emphasis is on interdisciplinary fields of linguistics, computer science, and psychology.

Automatic translation is one of the early applications of NLP. This effort goes back to the 1950s when governments were interested in translating Russian into English. But today, we use NLP in our daily lives even if we don’t know the names of the methods or techniques: searching in Google, using translation in Instagram comments, asking Siri a question on an iPhone, commanding Alexa on an Amazon device in order to play music or to control your smart home, or even listening to an AI news presenter!

Can you give us some brief examples of how NLP techniques are used?

In many applications we can break complicated tasks into subtasks or modules. For a specific application, we can integrate independent NLP modules into a pipeline where, in this chain, the output of each module becomes the input for the next one.

For example, real-time speech-to-speech translation is a complex task; instant conversational spoken phrases are automatically translated from a source language (ex. Japanese) and spoken aloud in the target language (ex. English). In this application, three distinct modules are necessary:

  • Voice recognition transcribes oral speech from the source language to text (speech to text)
  • Machine translation translates the transcription into the target language in textual format (text to text)
  • A Speech synthesizer converts the written translation into speech (text to speech)

How does NLP for social media differ from NLP for other implementations? Are there special challenges for NLP when applied to social media?

Social media data is different from traditional documents such as newspaper articles.

These new types of data are open source information that can be obtained publicly and have the following properties: Social, Real-time, Geo-spatially coded, Emotion, Neologisms, and Credibility/rumors. These non-structured texts can be found in many formats, written by different people in many languages and styles, written in everyday language. Moreover, authors are not professional writers and come from thousands of places.

It is a scientific challenge to develop powerful methods and algorithms which extract relevant information from a large volume of data in different languages. Conventional NLP methods in information extraction, automatic categorization and clustering, automatic summarization and machine translation need to be adapted to a new kind of data.

Are there popular misunderstandings about NLP you have come across?

Sometimes people underestimate NLP when comparing it with other computer science fields such as image processing. They only think of simple methods such as Term Frequency-Inverse Document Frequency (TF-IDF), which is used to find the most important words in an article. However, when analyzing text or conversations, we are not only dealing with words, grammar, and syntax but also should consider semantics and meaning. For these reasons, NLP algorithms are complicated. Thanks to advanced methods such as Deep Learning, NLP methods are more and more language independent.

How do concerns regarding privacy and personal data affect scholarly research on social media analytics?

Some information available on social media is public and some of it is private. There are several concerns about privacy in social media with respect to user information and how this massive volume of publicly available information can be used as open intelligence to help the general public, such as preventing online victimization and cyberbullying in schools. It is essential to consider the ethics of information in technology and business when using social media data. However, there is little guidance or research on how to protect this information.

What thoughts do you have about the future of NLP? What sorts of things will it be able to do that it now cannot?

Currently, NLP methods are widely used in many fields such as health care, finance, predicting voting intentions, entertainment, marketing, and security and defence applications.

In the future, the rapid advancement of technology will change the way humans and machines operate. The rise of wearable technologies, such as glasses, smart watches, healthcare devices, fitness trackers, sleeping monitors, and other devices will influence social media and communication. For example, health care applications are among the focal areas of wearable technologies. Microsoft, Google, and Apple have released their own health platforms, with which doctors and other health care professionals can monitor data, text, and voice collected via the patient's wearable technology. It seems that NLP techniques and applications will be more and more necessary to analyze data in the future, integrated with multi-media processing techniques.

Thank you, Anna!

Kevin Gray is President of Cannon Gray, a marketing science and analytics consultancy. He has more than 30 years’ experience in marketing research with Nielsen, Kantar, McCann and TIAA. Kevin also co-hosts the audio podcast series MR Realities.

Dr. Anna Farzindar is a faculty member of the Department of Computer Science at the University of Southern California. She was CEO and co-founder of NLP Technologies Inc., a company specializing in Natural Language Processing, and was Adjunct Professor at the University of Montreal and Honorary Research Fellow at the Research Group in Computational Linguistics at the University of Wolverhampton, UK. She received her PhD in Computer Science from the University of Montreal and her Doctorate in linguistics, mathematics and logic from Paris-Sorbonne University. Dr. Farzindar is co-author of the book Natural Language Processing for Social Media (2015 and 2nd Edition 2018) and has published numerous papers on NLP and its applications.


Sign Up