Data science shows surveys may assess language more than attitudes

Breakthrough research shows that current data science approaches may not just supplant traditional surveys (as seen through the Facebook experiments), but also suggests that the last 70 years of foundations for survey science require re-examination.

Elizabeth Lock, University of Colorado , Oct 2014.

Scientists who study patterns in survey results might be dealing with data on language rather than what they’re really after --attitudes -- according to an international study involving the University of Colorado Boulder.

The study, published in the journal PLOS ONE, found that people naturally responded to surveys by selecting answer options that were similar in language to each other as they navigated from one question to another, even when the similarities were subtle.

Semantic relationships between leadership

Figure 1. Direct and “mediated” semantic relationships between transformational leadership, intrinsic motivation and organizational outcomes (direct semantic relationships from transformational leadership to outcomes in brackets).

For the study, researchers looked specifically at surveys on organizational behavior, such as leadership, motivation and job satisfaction.

“The findings suggest many survey participants likely fit the first question into their language understanding and, when they get to the next question, move in their language network to figure out how close it is to the previous question in order to respond,” said Kai Larsen, information scientist and associate professor of management and entrepreneurship at CU-Boulder’s Leeds School of Business. Larsen is a co-author of the paper.

The findings also raise questions about the way scientists design and analyze surveys, inadvertently focusing attention on the shared language understanding of respondents, said Larsen.

“The methods used for surveys are making it difficult to get at what’s unique about an organization rather than what’s embedded in general language,” he said.

Often when social scientists conduct surveys with human participants, they look at more than just average scoring. In the results they detect -- and measure -- patterns. They quantify, for example, how much a popular answer to one question likely leads to a popular answer on another question to find common relationships.

The measurements help form statistics like, “people who highly rate their manager’s leadership style are more likely to stay longer at their jobs.”

In the case of the current study, researchers measured the degree of similarity in survey language instead of human response patterns. When they compared the measurements to measurements of human response patterns, the two sets of numbers were nearly identical, indicating the measurement of language similarities and people’s selection of survey answer options were practically the same thing.

For the study, the researchers applied two algorithms, or complex computer-operated calculations-- each using radically different approaches -- to measure sentence similarities.

The first algorithm involved about 100,000 newspaper articles to evaluate word similarities used within. The second algorithm relied on an online database created by linguists that shows the relationship between tens of thousands of words.

The surveys used in the study were already published and taken by anonymous respondents in a variety of fields from finance and government to engineering and the military. The respondents also included business students.

One type of survey that was not found to be language-based in the study was personality testing.

The study also highlights the growing prowess of data science.

“Semantic algorithms are becoming new tools for the social sciences and are broadening perspectives on survey responses that other longtime theories cannot explain,” said Arnulf. “This represents a study of how the relatively young data sciences can address problems not approachable with traditional methods.”

To see the complete study visit