Honors Thesis 2022 - Angela Cao

An Analysis of Causal Language Constructions in Diverse Discourse Data

Angela Cao

Highest Honor in Linguistics


Creating datasets of manually annotated texts for relationships such as causality has been of interest to computational linguists. This thesis introduces the annotated Constructions of CAUSE, ENABLE, and PREVENT (CCEP) corpus to contribute to the field by systematizing the nuanced CAUSE, ENABLE, and PREVENT roles and enabling annotation of a wide variety of causal construction types. This corpus utilizes constructions as the basic unit of causal language, which is based on the linguistic paradigm entitled Construction Grammar (CxG) and manifests through the surface construction labeling (SCL) approach. In this project, I adapt a pre-identified bank of causal connectives (the Constructicon) from Dunietz, 2018, which are used as triggers for annotation instances. Through high inter-annotator performance demonstrated in the corpus of 150 doubly-annotated documents based on the CCEP guidelines, I (1) support Wolff et al., 2005’s causal aspectualization as psychologically real through high inter-annotator agreement of distinguishing such, (2) build upon previous annotation work that aim to embed this model of causation, and (3) provide a high quality dataset for understanding textual causality.

Department / School

Linguistics / Emory University

Degree / Year

BS / Spring 2022


Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Marjorie Pak, Linguistics, Emory University
Yun Kim, Linguistics,Emory University
David Zureick-Brown, Mathematics, Emory University


Anthology | Paper | Presentation

Angela Cao, Jinho Choi, Gregor Williamson, David Zureick-Brown, Marjorie Pak, Yun Kim