arXiv AI recent: CombEval: A Framework for Evaluating Combinatorial Counting in Large Language Models
Researchers presented CombEval, a dynamic benchmark for evaluating combinatorial counting in large language models.,CombEval was used to evaluate 11 large language models under direct and...
CombEval represents each problem as a typed Cofola specification over entities, combinatorial objects, object dependencies, and constraints, enabling controlled generation of natural-language counting problems with exact solver-verified answers.,The code and generated benchmark suites for CombEva...