Cormack, Gordon V.Gómez Hidalgo, José MaríaPuertas Sanz, Enrique2016-08-082016-08-082007Cormack, G. V., Gómez Hidalgo, J. M., & Puertas Sanz, E. (2007). Spam filtering for short messages. In Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007) (pp. 313-320). New York, USA: ACM.9781595938039http://hdl.handle.net/11268/5576We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.engSpam filtering for short messagesconference output10.1145/1321440.1321486restricted accessCorreo electrónicoCorreo electrónico