Mastering NLP Job Interviews
What is NLP, and what types of questions related to NLP can you expect at the NLP-related job interviews?
NLP is not something all data scientists necessarily work with and are required to know. Whether or not you are, depends on the company interviewing you for a data science position. You’re not interested in NLP. Well, you’ll have to know what it is so you can avoid it in your career, if nothing else.
In case you’re intrigued by NLP and willing to learn more, you will benefit from knowing what interview questions you could expect.
What is NLP?
No, it’s not that pseudoscientific psychological approach that gained popularity recently. Neuro-linguistic programming, they call it.
“Our” NLP is also getting increasingly popular, but it refers to Natural Language Processing.
As Wikipedia nicely puts it, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation.
The key word in the above definition is ‘human’. In NLP, there’s an additional keyword: computer. From it comes the definition which says NLP deals with teaching computers how to understand natural language. Since it’s a computer, this understanding means processing and analyzing natural language data stored in different data formats.
To do that, NLP combines knowledge from artificial intelligence, computer science, and linguistics.
What is NLP’s Use?
NLP is becoming a feature of our everyday lives. As I was writing the previous sentence, Google’s Smart Compose suggested the phrase ‘everyday lives’. I accepted. Because that’s what I intended to write.
So this is one of its uses: autocorrect, autocomplete, and spell checkers. The NLP software scans the text for grammatical and spelling errors, corrects them, or gives correction suggestions. There are also spell checkers that can ‘understand’ the whole sentence's syntax, context, and meaning. Based on that, they suggest corrections or better-phrased sentences in line with the goal you’re trying to achieve with your text.
Language translation is another use of NLP. Whenever you’re in a foreign country, you probably use a translation tool, such as Google Translate. Also, translators are more and more used on social media, such as Facebook, Instagram, and Youtube.
Recognizing and generating speech is also one of the NLP uses. Think of Google Assistant, Windows Speech Recognition, Dragon, Siri, Alexa, or Cortana; they all seem to understand you (more or less) when you talk. Based on what you tell them, they will perform a certain action, such as browsing the internet, typing your words, or playing your favorite song. Some of these tools can even talk back to you, i.e., generate speech.
NLP can also decipher the ‘feel’ of the text. In other words, they can detect the sentiment behind the text, not only the literal meaning. This means understanding emotions (happy, angry, disturbed, neutral…), sarcasm, double meaning, metaphors, and expressions within a context. This is called sentiment analysis. Think of understanding the social media comments and removing those breaking the terms of service or getting the customers’ satisfaction by analyzing their comments and reviews.
NLP is heavily used in online marketing. The keywords you search are matched with the keywords of the companies, their products, and their ads. So when you start seeing ads for a product you just Googled, don’t worry. You’re not crazy; it’s NLP and targeted advertising at work.
What Does NLP Have to Do With Data Science?
Data scientists might not be interested in natural languages per se. Adding computer processing to it – where natural languages become data – and you might be drawing data scientists’ attention.
Maybe it’s not enough for the data scientists’ eyes to light up, but this could change by knowing that machine learning (ML) overlaps with and is often used in NLP.
Behind all the above uses of NLP usually lies ML. And ML is undeniably a field that is deeply immersed in data science.
When talking about ML, there’s usually a distinction between a supervised and unsupervised ML.
The supervised ML models most commonly used in NLP are:
- Support-Vector Machines (SVMs)
- Bayesian Networks
- Maximum Entropy
- Conditional Random Fields
- Neural Networks
Unsupervised learning is not that common in NLP, but still, some of the techniques are used:
- Latent Semantic Indexing (LSI)
- Matrix Factorization
Behind every ML model and algorithm, there are underlying statistics concepts.
These two areas are heavily tested in all serious companies looking for data scientists. The same is for companies dealing with NLP.
What can be specific to NLP is certain terminology, which you will be expected to know.
Take everything I mentioned here to form your interview preparation around three major topics.
The Questions for an NLP Interview
All the previous talk smoothly leads to the categories of NLP interview questions:
- General & NLP Terminology Questions
- Statistics Questions
- Modeling Questions
“I won’t be covering coding questions in this article. It’s common knowledge that data scientists generally have to be skillful coders, especially in SQL and Python. The same is true for data scientists working in NLP, so you should be ready for the coding part of the interview.”
1. General & NLP Terminology Interview Questions
These NLP interview questions deal with your knowledge of what NLP is, how it works, and the technical concepts specific to NLP.
This is the least ‘transferable’ data science knowledge. In other words, if you haven’t worked already with NLP, your previous data science knowledge wouldn’t help you here much. So if you have no working experience with NLP, take these questions very seriously and meticulously prepare them for the interview.
Some of the question examples are:
- What are the stages in the lifecycle of a natural language processing (NLP) project?
- What are some of the common NLP tasks?
- What is the difference between stemming and lemmatization?
- What is information extraction?
- What is sentiment analysis in NLP?
- List some open-source libraries for NLP.
2. Statistics Interview Questions
The statistics questions test your knowledge of the statistical concepts you will regularly use as a data scientist in general and when working on NLP projects.
Here are some examples:
- Bayesian vs. Frequentist Statistics: What is the difference between Bayesian vs. frequentist statistics?
- What are the hidden Markov random fields?
- Pearson's Correlation Coefficient: Prove why Pearson's correlation coefficient is between -1 and 1.
- What do you mean by perplexity in NLP?
3. Modeling Interview Questions
The third category of NLP interview questions deals with the ML and the models in general. This could refer to the most commonly used ML algorithms in NLP (as mentioned above) and to some other specific techniques and methods used in NLP.
Below are some examples:
- What are the differences between GPT and GPT-2?
- Do you like feature extraction or fine-tuning? How do you decide? Would you use BERT as a feature extractor or fine-tune it?
- What do you mean by Masked language modeling?
- PCA and LDA/QDA: What is the relationship between PCA and LDA/QDA?
- Naive Bayes Classifier: What is "naive" about a Naive Bayes classifier?
Natural language processing is a field that gets increasingly used in everyday life. Current uses include spell checkers, autocomplete tools, translators, speech recognition, and generation software. NLP is also heavily used in social media monitoring and online marketing.
NLP overlaps with machine learning, so plenty of ML knowledge applies to NLP, too. But don’t get too complacent! NLP is a vast and specific field that requires knowing very specific terminology, techniques, and methods commonly used.
Generally, the interview question types can be divided into general NLP questions, statistics questions, and modeling questions.
The examples and resources I gave you above are just a start. But even they are enough to make sure you go to an NLP job interview without fear.
Nate Rosidi is a data scientist and in product strategy. He's also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Connect with him on Twitter: StrataScratch or LinkedIn.