Detecting the temporal relations of events in a text is a complicated natural language understanding task. However, figuring out the timeline of events is key to improving machine comprehension. Previous work specified approaches to identifying events in texts, proposing appropriate temporal relations and ways to order events with respect to one another. However, the vast majority of existing temporal dependency annotation has been carried out on simple narrative text or news sources. The annotation schemes are not always applicable to noisy, highly variable, social media texts such as Reddit posts. We devise a more generalized and robust scheme to support a broader range of text annotation. In this research, we aim to 1) improve existing annotation guidelines for more complex sentence structures, 2) evaluate the annotation performance among student annotators to achieve competitive inter-annotator agreement scores, 3) quantify the characteristics unique to Reddit text and provide a statistical analysis of the difficulties encountered when annotating Reddit data, and 4) compare and contrast the effectiveness of our temporal annotation scheme across three diverse sources: children’s stories, social media texts, and news articles. The results show that our annotation scheme is effective in identifying events with high-level inter-annotator agreement scores, but there is still space to improve for identifying timelines of events. Besides, our results show the challenges of generating a unifying temporal relations scheme for different types of text. These challenges lead to the discussion of how to evaluate the effectiveness of temporal relation schemes.
Quantitative Theory and Methods (Linguistics) / Emory University
BS / Spring 2022
Jinho D. Choi, Computer Science and QTM, Emory University (Chair)
Marjorie Pak, Linguistics, Emory University
Jason McLarty, Linguistics, Emory University
Yingying Chen, Jinho Choi, Marjorie Pak, Jason McLarty