Spam filtering for short messages

dc.contributor.authorCormack, Gordon V.
dc.contributor.authorGómez Hidalgo, José María
dc.contributor.authorPuertas Sanz, Enrique
dc.date.accessioned2016-08-08T12:15:18Z
dc.date.available2016-08-08T12:15:18Z
dc.date.issued2007
dc.description.abstractWe consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective.spa
dc.description.filiationUEMspa
dc.description.impact0.400 SJR (2007) Q1, 52/235 Business, Management and accounting (miscellaneous)spa
dc.description.sponsorshipSin financiaciónspa
dc.identifier.citationCormack, G. V., Gómez Hidalgo, J. M., & Puertas Sanz, E. (2007). Spam filtering for short messages. In Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007) (pp. 313-320). New York, USA: ACM.spa
dc.identifier.doi10.1145/1321440.1321486
dc.identifier.isbn9781595938039
dc.identifier.urihttp://hdl.handle.net/11268/5576
dc.language.isoengspa
dc.peerreviewedSispa
dc.publisherACMspa
dc.rights.accessRightsrestricted accessen
dc.subject.uemCorreo electrónicospa
dc.subject.unescoCorreo electrónicospa
dc.titleSpam filtering for short messagesspa
dc.typeconference outputspa
dspace.entity.typePublication
relation.isAuthorOfPublication76a395e8-090d-4187-9a3c-420063e1f44f
relation.isAuthorOfPublication001b7f40-b837-4929-82ca-df26041a995a
relation.isAuthorOfPublication.latestForDiscovery76a395e8-090d-4187-9a3c-420063e1f44f

Files