Spam filtering for short messages
Loading...
Identifiers
Publication date
Authors
Advisors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
ACM
Abstract
We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.
Description
UNESCO Subjects
Keywords
Bibliographic reference
Cormack, G. V., Gómez Hidalgo, J. M., & Puertas Sanz, E. (2007). Spam filtering for short messages. In Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007) (pp. 313-320). New York, USA: ACM.





