Data Scientist Survey: What Is An Interesting Result?

A survey requesting feedback from data scientists on their opinion of what an interesting result is. The survey is anonymous, has only a single mandatory question, and takes only 5 minutes.

By M. Houssem Hachmaoui and Alexandre Termier, INRIA, France.


Recently on KDnuggets, an article presented "What questions can data science answer?." However, often in the context of "exploratory data mining," the questions are not so clear cut. Promising approaches integrate the user in the loop and are interactive (such as OneClick [PDF]): the user acts as an oracle stating if current results are "interesting" or not, which drive the system to better and better results for this user.

We would like to enhance this process by allowing the user to express "lightweight" properties of interest of the results of data mining, that are easier to pose than formulating a full-fledged query. Ideally, we imagine a set of small properties that would be composable, allowing to express relatively complex interestingness goals.

An example of such properties could be: "the results must incorporate the time component of the data," "the results must be characteristic of a given part of the data," "the results must be accompanied by an easy to understand probability (>0.9 or <0.1)"...

Data science wordcloud

We turn to you Data Scientists to help us: as you analyze data for a living, you must have a pretty good idea of what is an interesting result, at least in a given context.

We would be delighted if you take 5 minutes to answer our survey. There is only 1 mandatory question, and no name is required.

We expect to gather from your answers a first set of properties, and we’ll present you the results in a further article in KDnuggets.

Thanks for your help!