arXiv AI recent: Benchmarking Agentic Review Systems
A new class of agentic review systems is being evaluated as a potential remedy to the pressure on peer review systems caused by AI-assisted research.,The evaluation involves assessing the...
The evaluation involves four systems: OpenAIReview, coarse, Reviewer3, and a zero-shot baseline, which are tested across six large language models.,The systems are evaluated on their ability to track paper quality as approximated by external signals such as citations and acceptance decisions, and...