Assembling Multi-Turn Dialogues from Reddit Data

Mack Hutsell

Date: 2022-02-18 / 4:00 ~ 5:00 PM
Location: MSC E306 (


Recently, advances such as BlenderBot 2.0 have been powered by a new form of dataset created by intelligent use of computational processes. The data for BlenderBot, for example, took Reddit posts and extracted millions of two-turn conversations, as well as “persona profiles” for the associated speakers. Such approaches opened the door for more sophisticated computational approaches to dataset creation. We’ve tested several model variants — taking advantage of BlenderBot, BERT Next Sentence Prediction, and reddit’s structure — to identify a strong-performing model for multi-turn dialogue assembly.