Spam filtering for short messages

Loading...
Thumbnail Image
Identifiers

Publication date

Advisors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

ACM

Metrics

Google Scholar

Research Projects

Organizational Units

Journal Issue

Abstract

We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.

Description

UNESCO Subjects

Keywords

Bibliographic reference

Cormack, G. V., Gómez Hidalgo, J. M., & Puertas Sanz, E. (2007). Spam filtering for short messages. In Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007) (pp. 313-320). New York, USA: ACM.

Type of document