Mendeley's DataTEL Data Set

Mendeley has taken up the DataTEL challenge in order to provide recommendation system researchers with valuable data on users and their relationship with scientific literature.

Mendeley has taken up the DataTEL challenge in order to provide recommendation system researchers with valuable data on users and their relationship with scientific literature.

Mendeley has and continues to build a strong user community of researchers who benefit from both its desktop and web-based software. In building its community, Mendeley has recorded a considerable amount of data that can be analyzed in order to support researchers to do better research.

One key area in which researchers are helped is by providing them with recommendations on research articles that they have not yet encountered but would be interested in. Recommendation system research, while being well studied in some domains, such as motion pictures, lacks the kind of scientific data sets that Mendeley has been building.

This particular data set includes usage data from more than 4.8M research papers selected from a sample size of 50K active users. It is described in full in the accompanying paper Mendeley's Reply to the DataTEL Challenge.

The data set has been made anonymous to protect user privacy and can only be used for non-commercial scientific purposes.

See dev.mendeley.com/datachallenge/

See also Mendeley Data vs. Netflix Data, by Andre Vellino

I was gratified to note that this is almost exactly the user-item ratio (1:100) that I indicated in my poster at ASIS&T2010 was typically the cause of the data sparsity problem for recommenders in digital libraries. If we measure the sparseness of a dataset by the number of edges in the bipartite user-item graph divided by the total number of possible edges, Mendeley gives 2.66E-05. Compared with the sparsity of Neflix - 1.18E-02 - that's a difference of 3 orders of magnitude!