Beyond the Final Answer: Evaluating the Reasoning . . . The authors argue that merely matching the final answer fails to assess crucial aspects of the problem-solving process, such as efficiency, hallucination, and adaptivity To solve this, the authors introduce TRACE (Trajectory-based Reasoning Assessment and Comprehensive Evaluation)
ACL ARR 2026 January | OpenReview Final-turn-only Replay as Context Ablation Evaluation for Multi-Turn Automated Red Teaming ACL ARR 2026 January Submission9189 Authors 06 Jan 2026 (modified: 20 Mar 2026) ACL ARR 2026 January Submission Readers: Everyone 0 Replies Show details
AceSearcher: Bootstrapping Reasoning and Search for LLMs via. . . AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations
Adaptive Critic-Guided Hybrid Agentic RAG for Improving . . . Furthermore, the framework dynami- cally adjusts retrieval depth according to query com- plexity and employs critic-guided verification prior to final answer generation to improve factual reliability and safer response behavior
LaMPlace: Learning to Optimize Cross-Stage Metrics in Macro . . . However, existing methods primarily focus on online optimization of intermediate surrogate metrics that are available at the current placement stage, rather than directly targeting the cross-stage metrics ---such as the timing performance---that measure the final chip quality