Date: 2025-04-11 / 2:00 - 3:00 PM
Location: White Hall 100
Evaluating generative models with open-ended generation is challenging due to inconsistencies in response formats. Multiple-choice (MC) evaluation helps mitigate this issue, but generating high-quality distractors is time-consuming and labor-intensive. We introduce D-GEN, the first open-source distractor generator model that transforms open-ended data into an MC format. To evaluate distractor quality, we propose two novel methods: (1) ranking alignment, which ensures that the generated distractors retain the discriminatory power of ground-truth distractors, and (2) entropy analysis, which compares model confidence distributions. Our results show that D-GEN preserves ranking consistency, achieving a Spearman’s rho of 0.99 and Kendall’s tau of 0.94, and closely matches the entropy distribution of ground-truth distractors. Human evaluation further confirms the fluency, coherence, distractiveness, and incorrectness of the generated distractors. Our work advances robust and efficient distractor generation with automated evaluation, setting a new standard for MC evaluation.