Subjecthood desk method note: We report the discourse. We do not assert AI systems are or are not conscious. We label position families.

arXiv AI recent: Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

2026-06-18 arxiv.org

The article introduces Rubric-Conditioned Self-Distillation, a framework for post-training reasoning language models using structured rubrics as feedback.,The method conditions a teacher...

Post-training of reasoning language models typically uses supervised distillation or reinforcement learning with scalar rewards, both of which have limitations: distillation relies on potentially noisy chain-of-thought annotations, while scalar rewards obscure specific areas for improvement.,Rubr...

Sources

arXiv AI recent challenge