This year's KDDCUP is sponsored by Tencent Inc., which is China's largest Internet company in terms of active users (over 700 Million users as of Jan. 2012). Tencent Inc. owns a full portfolio of popular products including instance messaging, email, and news portal, search engine, online games, blogging and micro-blogging in China, offering a rich opportunity to build user models for highly effective user intent prediction and result recommendation. This year's KDDCUP consists of two separate tasks.
Task 1. Social Network Mining on Microblogs (Weibo)
Tencent Weibo (t.qq.com/) offers a wealth of social-networking information. For the 2012 KDDCUP, the released data represents a sampled snapshot of the Tencent Weibo users' preferences for various items - the recommendation to users and follow-relation history. In addition, items are tied together within a hierarchy. That is, each person, organization or group belongs to specific categories, and a category belongs to higher-level categories. In the competition, both users and items (person, organizations and groups) are represented as anonymous numbers that are made meaningless, so that no identifying information is revealed. The data consists of 10 million users and 50,000 items, with over 300 million recommendation records and about three million social-networking `following' actions. Items are linked together within a defined hierarchy, and the privacy-protected user information is very rich as well. The data has timestamps on user activities. The task is to predict whom users will follow among all potential users.
Task 2. User Click Modeling based on Search Engine Log Data
Online advertising has been the financial support of the Internet industry for years. Three successful kinds of computational ad systems are search ad, contextual ad and social networking ad systems. Search ads systems retrieve and rank ads given a query, and display result ads together with results from the search engine. Once a user clicks on an ad, the advertiser pays the search engine for its help on promotion. The ranking of ads is to maximize users' satisfaction, advertisers' return-on-investment and search engine's revenue. Contextual ad systems involve an additional role, the publishers, who own Internet properties like Web sites, forums or mobile apps. Programs embedded in these properties request ads from ad systems. The ad system finds ads that semantically match content of the properties. Recently, a third kind of computational ad systems is gaining popularity, including social network ads, gained a lot of attention, where the ad system ranks ads with consideration of social relationship.
In all aforementioned systems, a key algorithmic component is to predict the click-through rate (pCTR) of ads. This is because all such systems optimize monetization under the supervision of economic rules (e.g., General Second Price auction, the one behind Google AdWords and others); and these rules require ads pCTR values to rank ads and to price clicks. The closer the pCTR to the truth, the more effective the monetization would be. The use of user information, including demographics and historical behaviors on search engines, e-business platforms, social networks, and micro-blogs, is likely valuable to improve the accuracy of ads pCTR in all above systems.
Task 2's aim is to accurately predict the ads' click-through rate in online computational ad systems.
- Feb 20, 2012: Competition announcement linked to KDD official site
- Mar 1, 2012: Registration opens (dataset ready for the public)
- Mar 15, 2012: Competition begins
- Jun 1, 2012: Competition ends (submission deadline)
- Jun 5, 2012: Results compiled
- Jun 8, 2012: Winners notified
- Aug 12, 2012: Workshop
- Dr. Gordon Sun, Chief Scientist, Tencent Inc.
- Dr. Yading Aden Yue, Expert Researcher, Tencent Inc.