Manuel Arranz
This paper focuses on the general problem of the lexical bottleneck and, in particular, on the issues of semantic clustering and disambiguation by means of word usage cues obtained from sublanguage-specific corpora. Our approaches combines the use of numerical techniques with some symbolic modules. Our numerical tool Dynamic Context Matching is supported by three symbolic modules and a numerical one, which help considerably to reduce any remaining ambiguity. Furthermore, the development of a Unix-specific ontological knowledge hierarchy is also detailed. This ontology consists in a series of function categories, which reflect the different meaning aspects of each word as well as the relationships that can be established among these aspects and among the words holding them. Therefore, this hierarchy can be seen both as a semantic knowledge repository where all the semantic information extracted from the corpus is allocated, as well as an evaluation standard for our module, given that it contains all the information required to evaluate the clusters automatically acquired by the system.