arXiv AI recent: Who Drifted: the System or the Judge? Anytime-Valid Attribution in LLM Evaluation Pipelines
Researchers proposed a method to resolve ambiguity in continuous evaluation of LLM products.,The method uses a fixed, human-labeled anchor set and a second betting e-process to attribute...
The proposed method uses a fixed, human-labeled anchor set that the current judge re-scores at a steady interleave, and a second betting e-process on the judge-versus-human gap.,The method returns a verdict in {none, system, judge} and has been proven to have anytime-validity, one-way identificat...