Cross-genre Document Retrieval: Matching between Conversational and Formal Writings

Tomasz Jurczyk, Jinho D. Choi


This paper challenges a cross-genre document retrieval task, where the queries are in formal writing and the target documents are in conversational writing. In this task, a query, is a sentence extracted from either a summary or a plot of an episode in a TV show, and the target document consists of transcripts from the corresponding episode. To establish a strong baseline, we employ the current state-of-the-art search engine to perform document retrieval on the dataset collected for this work. We then introduce a structure reranking approach to improve the initial ranking by utilizing syntactic and semantic structures generated by NLP tools. Our evaluation shows an improvement of more than 4% when the structure reranking is applied, which is very promising.

Venue / Year

Proceedings of the EMNLP Workshop on Building Linguistically Generalizable NLP Systems (BLGNLP) / 2017


Anthology | Paper | Presentation | BibTeX