Date: 2022-11-11 / 4:00 ~ 5:00 PM
Location: MSC E306 (https://emory.zoom.us/j/99364825782)
Context in dialogues has been a critical element to empower conversational agents to conduct fluent, consistent and enriching conversations. With the awareness of context in a conversation, a conversational agent would be able to capture, understand, and utilize relevant information , such as name entity mentions, topics of interest in discussion, transitions, tendency of ending, etc, to converse with human users naturally. However, it is a challenging task to incorporate contextual information. First, contextual information takes miscellaneous forms, such as topics, previous utterances, user intents, emotion semantics, etc. Decisions on which information would be contributing or how to take arbitrary context are always difficult. Second, how to properly and effectively fuse context into models is critical. To address the above challenges, my proposal focuses on exploring and experimenting on leveraging different contextual information in the embedding space on different models and tasks. In addition, I develop models that overcome the general limitation of state of the art language models, which is the maximum number of tokens they can encode and the incapacity of fusing arbitrary forms of contextual information. Furthermore, upon processing conversational data, diarization methods are explored to resolve speaker ID errors in the transcriptions, which is crucial to the quality of training data.
In the proposal, models are developed to tackle the challenges of integration of context into dialogue models in both retrieval-based and generation-based dialogue systems. In the retrieval-based systems, one response is usually selected and returned by ranking all responses from different components. A contextualized conversational ranking model is proposed and evaluated on a popular conversational corpus MSDialog. Three types of contextual information are leveraged and incorporated into the ranking model: 1) previous conversation utterances from both speakers; 2) response candidates that are semantically similar; 3) domain information for each candidate response. The performance of the contextual response ranking model exceeded the state of the art models in previous research and has given promising 2 insight and capability of incorporating various forms of context into modeling. In generation-based systems, one response generated by a generative model is returned to the other conversing party. A generative model is built on top of a state of the art model, Blenderbot, overcoming its limitations, to integrate two types of contextual information: 1) previous conversation utterances from both conversing parties; 2) identified key utterances that provide important references. The model is trained on an interview dataset and evaluated on annotated test set as well as by professional interviewers and students in real conversations. The average satisfaction score from professional interviewers and students is 3.5 out of 5, which shows promising future applications.
Furthermore, to further improve the performance of the current generative model, two models are proposed to better integrate contextual information. I propose a generative model that incorporates the flow of “topics” in a conversation. Based on the current utterance, and other contextual information, a “topic” embedding representation is generated and taken into the generation of the next utterance.