Subjecthood desk method note: We report the discourse. We do not assert AI systems are or are not conscious. We label position families.

arXiv AI recent: Closing the Reflection Gap: A Free Calibration Bonus for Agentic RL

2026-06-15 arxiv.org

LLM agents tend to mis-assess their own outputs after observing concrete environment feedback, resulting in a persistent reflection gap.,A new method called RefGRPO is proposed to close t...

LLM agents are increasingly deployed as agents that interact with external environments and observe feedback.,The proposed RefGRPO method improves reflection calibration and task accuracy, and enables better self-improvement and more effective test-time selective prediction.

Sources

arXiv AI recent challenge