Spam filtering for short messages
| dc.contributor.author | Cormack, Gordon V. | |
| dc.contributor.author | Gómez Hidalgo, José María | |
| dc.contributor.author | Puertas Sanz, Enrique | |
| dc.date.accessioned | 2016-08-08T12:15:18Z | |
| dc.date.available | 2016-08-08T12:15:18Z | |
| dc.date.issued | 2007 | |
| dc.description.abstract | We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bag-of-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective. | spa |
| dc.description.filiation | UEM | spa |
| dc.description.impact | 0.400 SJR (2007) Q1, 52/235 Business, Management and accounting (miscellaneous) | spa |
| dc.description.sponsorship | Sin financiación | spa |
| dc.identifier.citation | Cormack, G. V., Gómez Hidalgo, J. M., & Puertas Sanz, E. (2007). Spam filtering for short messages. In Proceedings of the sixteenth ACM conference on information and knowledge management (CIKM 2007) (pp. 313-320). New York, USA: ACM. | spa |
| dc.identifier.doi | 10.1145/1321440.1321486 | |
| dc.identifier.isbn | 9781595938039 | |
| dc.identifier.uri | http://hdl.handle.net/11268/5576 | |
| dc.language.iso | eng | spa |
| dc.peerreviewed | Si | spa |
| dc.publisher | ACM | spa |
| dc.rights.accessRights | restricted access | en |
| dc.subject.uem | Correo electrónico | spa |
| dc.subject.unesco | Correo electrónico | spa |
| dc.title | Spam filtering for short messages | spa |
| dc.type | conference output | spa |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 76a395e8-090d-4187-9a3c-420063e1f44f | |
| relation.isAuthorOfPublication | 001b7f40-b837-4929-82ca-df26041a995a | |
| relation.isAuthorOfPublication.latestForDiscovery | 76a395e8-090d-4187-9a3c-420063e1f44f |

