KDnuggets Home » News » 2011 » Jun » Software » New dataset released: SMS Spam Collection v.1  ( < Prev | 11:n15 | Next > )

New dataset released: SMS Spam Collection v.1


 
  
a public dataset of 5,574 SMS (text) messages collected for mobile phone spam research, tagged as legitimate or spam.


Date: Jun 9, 2011

The SMS Spam Collection v.1 is a public set of SMS (text) labeled messages that have been collected for mobile phone spam research. Spam It has one dataset composed by 5,574 English, real and non-encoded messages, tagged as legitimate (ham) or spam.

The collection is free for all purposes, and it is publicly available at:
www.dt.fee.unicamp.br/~tiago/smsspamcollection/

This corpus has been collected from free or free for research sources at the Internet including the Grumbletext Web site, the NUS SMS Corpus, Caroline Tag's PhD Thesis, and a smaller previous collection (SMS Spam Corpus v.0.1:
www.esp.uem.es/jmgomez/smsspamcorpus/, available for historic comparison).

A comprehensive study of this corpus can be found in the following paper, which offers a number of statistics, studies and baseline results for several machine learning methods:

Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A. Contributions to the study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (ACM DOCENG'11), Mountain View, CA, USA, 2011. (Accepted)


 
Related
Data Mining Software

KDnuggets Home » News » 2011 » Jun » Software » New dataset released: SMS Spam Collection v.1  ( < Prev | 11:n15 | Next > )