There are terabytes of data that come from surveys and most of this is unstructured – the kind where respondents type in their views in an open box. Of course these questions come in mixed with the standard survey type structured questions “On a scale of 1 to 5, how happy …”. Structured or numerical responses like these are well understood and analysis is an established process (cleaning them from a data science perspective is totally another matter!). However the analysis of unstructured responses can be done better with the latest advances in natural language processing tools.
Unstructured responses are called verbatim responses and currently they are analyzed manually by exporting the data into Excel. Considering the fact that today most smartphones pack way more punch when it comes to analyzing text, this can be improved. Most of us will recognize that the word suggestions that a smartphone makes when you are typing a text message for example. However what lies behind this is a powerful predictive analytics engine that is based on Markov models, which looks at the context (your last few words or phrases) and predicts the 3 most likely next words that you may want to type.
We built this “next word predictor” (NWP) as an enhancement for existing survey analysis tool kits. Here is an interesting use case that explains how this works.
The survey analyst wants to understand what a survey reveals about issues manufacturing companies face in adopting new technology. Currently the user needs to either select all questions that apply to new technology adoption and then explore each verbatim response manually, or search for specific keywords and then explore each verbatim response manually.
With the NWP enhancement, user simply types in a word such as “design”, the system will predict the most likely next word that survey respondents have used. In this case it is “software”. By following this predictive word trail, a user will see very soon see the most common problem with design – “design software struggles” or “design software hard”, which seems to be a very common complaint when it comes to adopting new CAD software in companies that do not have adequate employee skills. Getting to this insight without such predictive models would require much more effort that involves parsing through potentially 100’s of verbatim responses.
With NWP, there is no need to filter and export the data to Excel and then do the analysis, the insights from unstructured data are captured in less than 2-3 steps within the existing toolkit framework.
Most survey tools today allow two types of unstructured data exploration
- Pull all available complete verbatim responses that contain a user-specified keyword
- Pull all available complete verbatim responses associated with a standard survey question
These results can then be filtered by other attributes to provide additional drill down.
Next Word Predictor (NWP) can enhance this ability by providing dynamic drill down into verbatim responses and predicts the most likely next word associated with a user-specified keyword across all questions.
NWP moves beyond basic frequency counts, by employing Markov models with back-off – the next word predicted is based on previous word(s).
Other benefits of NWP:
- NWP accuracy can be rigorously quantified
- NWP can be extended to perform sentiment analysis automatically on new survey responses
All of these benefits from a straightforward implementation of predictive analytics on text.
Download our case study on how text mining was used for customer segmentation based on surveys.
Image courtesy: https://blog.swiftkey.com/neural-networks-a-meaningful-leap-for-mobile-typing/