TY - JOUR A1 - Ureña-López, L. Alfonso AU - De Buenaga Rodríguez, Manuel AU - Gómez, José Manuel T1 - Integrating linguistic resources in TC through WSD Y1 - 2001 SN - 00104817 UR - http://hdl.handle.net/11268/5790 AB - Information access methods must be improved to overcome the information overload that most professionals face nowadays. Text classification tasks, like Text Categorization, help the users to access to the great amount of text they find in the Internet and their organizations. TC is the classification of documents into a predefined set of categories. Most approaches to automatic TC are based on the utilization of a training collection, which is a set of manually classified documents. Other linguistic resources that are emerging, like lexical databases, can also be used for classification tasks. This article describes an approach to TC based on the integration of a training collection (Reuters-21578) and a lexical database (WORDNET 1.6) as knowledge sources. Lexical databases accumulate information on the lexical items of one or several languages. This information must be filtered in order to make an effective use of it in our model of TC. This filtering process is a Word Sense Disambiguation task. WSD is the identification of the sense of words in context. This task is an intermediate process in many natural language processing tasks like machine translation or multilingual information retrieval. We present the utilization of WSD as an aid for TC. Our approach to WSD is also based on the integration of two linguistic resources: a training collection (SEMCOR and Reuters-21578) and a lexical database (WORDNET 1.6). We have developed a series of experiments that show that: TC and WSD based on the integration of linguistic resources are very effective; and, WSD is necessary to effectively integrate linguistic resources in TC. KW - Inteligencia artificial KW - Sistemas informáticos KW - Lenguajes de ordenador KW - Inteligencia artificial KW - Informática KW - Lenguaje de programación LA - eng ER -