Distributional Semantic word representation allows Natural Language Processing systems to extract and model an immense amount of information about a language. This technique maps words into a high dimensional continuous space through the use of a single-layer neural network. This process has allowed for advances in many Natural Language Processing research areas and tasks. These representation models are evaluated with the use of analogy tests, questions of the form "If a is to a' then b is to what?" are answered by composing multiple word vectors and searching the vector space. During the neural network training process, each word is examined as a member of its context. Generally, a word's context is considered to be the elements adjacent to it within a sentence. While some work has been conducted examining the effect of expanding this definition, very little exploration has been done in this area. Further, no inquiry has been conducted as to the specific linguistic competencies of these models or whether modifying their contexts impacts the information they extract. In this paper we propose a thorough analysis of the various lexical and grammatical competencies of distributional semantic models. We aim to leverage analogy tests to evaluate the most advanced distributional model across 14 different types of linguistic relationships. With this information we will then be able to investigate as to whether modifying the training context renders any differences in quality across any of these categories. Ideally we will be able to identify approaches to training that increase precision in some specific linguistic categories, which will allow us to investigate whether these improvements can be combined by joining the information used in different training approaches to build a single, improved, model.
Computer Science / Emory University
BS / Spring 2016
Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Valerie Summet, Computer Science, Emory University
Phillip Wolff, Psychology, Emory University