arXiv AI recent: Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL
LLM agents tend to mis-assess their own outputs after observing concrete environment feedback, resulting in a persistent reflection gap.,A new method called RefGRPO is proposed to close t...
LLM agents are increasingly deployed as agents that interact with external environments and observe feedback.,The proposed RefGRPO method improves reflection calibration and task accuracy, and enables better self-improvement and more effective test-time selective prediction.