Lexical Disambiguation in Machine Translation with Latent Semantic Analysis

Davis, Elizabeth Elwyn

View/Open

WLURG038_Davis_thesis_2006.pdf (20.46Mb)

Author

Davis, Elizabeth Elwyn

Subject

Washington and Lee University -- Honors in Computer Science

Latent semantic indexing

Semantics -- Data processing

Discourse analysis -- Data processing

Machine translating

Metadata

Show full item record

Description

This paper examines a possible solution to the problem of disambiguating polysemous nouns in machine translation. Latent Semantic Analysis (LSA) , a statistical method of finding and representing word sense, is used to differentiate between the different meanings of ambiguous words according to the given context. A collection of training texts are sorted according to polysemous word and meaning. A word-by-text matrix is created from this data and transformed by the LSA method, creating vectors for each text defining it in terms of the (non-polysemous) words that appear in it. These representations of textual meanings are compared to the context of an ambiguous word to determine the most similar meaning. The viability of this LSA model is compared with a simple Bayesian probability model.

URI

https://dspace.wlu.edu/handle/11021/36365

Collections

W & L Historic Theses