Lexical Disambiguation in Machine Translation with Latent Semantic Analysis
View/ Open
Author
Davis, Elizabeth Elwyn
Subject
Washington and Lee University -- Honors in Computer Science
Latent semantic indexing
Semantics -- Data processing
Discourse analysis -- Data processing
Machine translating
Metadata
Show full item recordDescription
This paper examines a possible solution to the problem of disambiguating polysemous nouns
in machine translation. Latent Semantic Analysis (LSA) , a statistical method of finding and
representing word sense, is used to differentiate between the different meanings of ambiguous
words according to the given context. A collection of training texts are sorted according
to polysemous word and meaning. A word-by-text matrix is created from this data and
transformed by the LSA method, creating vectors for each text defining it in terms of the
(non-polysemous) words that appear in it. These representations of textual meanings are
compared to the context of an ambiguous word to determine the most similar meaning.
The viability of this LSA model is compared with a simple Bayesian probability model.