MS Thesis 2020 - Zhexiong Liu

Hierarchical Entity Extraction and Ranking with Unsupervised Graph Convolutions

Zhexiong Liu


Abstract

Entity extraction problems have been extensively studied in terms of investigating the capability of extracting entities from text using natural language processing (NLP). Most research involves training learnable models on a large amount of corpus to ex- tract entities and determine their salience. Typically, these systems aim to retrieve an array of ranked entities from a set of documents while giving queries, which mainly measure the relevance between queries and entities. However, this thesis leverages semantic and syntactic information within the documents to perform entities extraction as well as entity ranking. In particular, given document corpus, constituency parsing trees are constructed to extract entity mentions (phrases) for each article. Meanwhile, dependency parsing trees and entity coreference clusters are employed to build a relation graph, of which nodes denote entity mentions and edges denote mention relations. Moreover, graph convolution is performed on the relation graph to normalize the mention representation with respect to mention embeddings. Hierarchical density-based clustering and ranking mechanism are applied to compute entity priors. To evaluate this work, three models are proposed and evaluated on 60 annotated articles. Preliminary results illustrate that the usage of parsing trees, along with entity coreference relations improves the effectiveness of entity extraction and ranking. The interesting hierarchical trees for entity extraction, the principles for graph construction, as well as the system architecture serve as main contributions of this thesis .

Department / School

Computer Science / Emory University

Degree / Year

MS / Spring 2020

Committee

Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Shun Yan Cheung, Computer Science, Emory University
Michelangelo Grigni, Computer Science, Emory University

Links

Anthology | Paper | Presentation