![[2025F] Noah Reicin (BA)](https://files.cdn-files-a.com/uploads/4165719/2000_68bef7fb13213.jpg)
Date: 2025-10-17 / 3:00 - 4:00 PM
Location: White Hall 100
Instruction following is central to large language models (LLMs), yet their reliability under non-linear or structurally disrupted prompts remains poorly understood. We introduce a controlled evaluation framework that isolates prompt topology, independent of content complexity, as a variable in instruction adherence. Using rephrased Jeopardy!1 question–answer pairs held constant across conditions, we compare performance on linear prompts, which follow sequential order, and jumping prompts, which require non-sequential traversal through explicit directional cues. Across 600 evaluations spanning seven state-of-the-art open-source models, accuracy dropped by up to 83% under jumping conditions, revealing a strong dependence on positional continuity. Error analysis shows that most failures stem from instruction-order violations and semantic drift, indicating that current architectures internalize instruction following as a sequential pattern rather than an abstract reasoning skill. These results highlight structural sensitivity as a fundamental limitation in LLM alignment and reasoning, establishing a reproducible benchmark for testing robustness to discontinuous procedural execution.