Web Search Click Data workshop and competition (on Yandex data)

Workshop participants will get Yandex click log challenge dataset, and monetary prizes will be awarded to the winners. Of particular interest is the relation between click data and editorial relevance judgments

The WSCD 2012 chairs invite you to submit articles to the Second Workshop on Web Search Click Data. This workshop will be held in conjunction with WSDM 2012 on February 12, 2012, Seattle WA, USA. Together with this workshop and using the same dataset provided to all participant, a Challenge will be organized and money prizes will be awarded to the winners.

WSDM 2012

Web search click data has caught the interest of a growing community of professionals during the last five years. It provides a snapshot of the typical information access patterns of the user population, unlike a group of relevance judges or the tags provided on collaborative sites. Issued queries and clicked document records have the potential to give an accurate view of millions of people's daily interests, how these interests evolve, and how these interest are related, etc. Yet this information is not easily extracted because in spite of its abundance it is sparse, most queries are repeated only a few time if at all, and clicks cannot be interpreted as document relevance directly. Moreover, user clicks are strongly biased by the search engine ranking, which mean that many documents are never seen in spite of being relevant.

Of particular interest is the relation between click data and editorial relevance judgments. A large body of Literature exists on how to use editorial judgments to evaluate and train Machine Learned Ranking functions. This describes what has been and continues to be a largely successful technology that serves millions of users every day, yet relying on editorial judgments as the Gold Standard against which to learn has its limits: 1) The metrics used to evaluate rankings are heuristics and hard to relate to user behavior, 2) As search technologies are extended to new areas, it is ever harder for editors to provide accurate judgments. This applies to most verticals like "local search" and to newer research areas like "diversity" where the question is how to introduce diversity in document ranking or "personalization" where the goal is to adapt the search results to take into account what is known from the user who issued the query.

In this workshop, we will attempt to address these issues but we will also explore novel applications and use of these data for enhancing the user search experience.

Research on the incorporation of click data into information retrieval systems, and for understanding user search, has been hampered by a lack of shared datasets. This workshop provides a common click dataset and a forum for presenting new results and analysis in the area. The dataset includes user sessions extracted from Yandex logs, with queries, URL rankings and clicks. Unlike previous click datasets, it also includes relevance judgments for the ranked URLs, for the purposes of training relevance prediction models. To allay privacy concerns the user data is fully anonymized. So, only meaningless numeric IDs of queries, sessions, and URLs are released. The queries are grouped only by sessions and no user IDs are provided. More details are available at imat-relpred.yandex.ru/

Submissions should present original results and new ideas and can be up to 8 pages in length but shorter works are encouraged. Papers should properly place the work within the field, cite related work, and clearly indicate the innovative aspects of the work and its contribution to the field, using for instance proper evaluation methods. We strongly encourage evaluations that are repeatable and make use of the provided dataset.

Submissions should not be under review or be already accepted in a journal or another conference.

All papers will be peer-reviewed by at least three reviewers from an International Program Committee; promising papers identified will then be discussed in a meeting of PC chairs, where the final selections will be made. Accepted papers will appear in the conference online proceedings published by the ACM Digital Library and the conference web site. Authors of accepted papers will retain proprietary rights to their work, but will be required to sign a copyright release form.

The submission site will come online one month before the abstract due date.

Important Dates

Start of Challenge: October 15, 2011
Papers due: December 5, 2011
End of Challenge: December 15, 2011
Notification of Acceptance: January 10, 2012
Camera-Ready: January 17, 2012
Workshop: February 12, 2012

The workshop website is research.microsoft.com/en-us/um/people/nickcr/wscd2012.

See imat-relpred.yandex.ru/ for more information on the Challenge.

Related
→ Data Mining Competitions