arXiv AI recent: A Systematic Evaluation of Black-Box Uncertainty Estimation Methods for Large Language Models
The authors conducted a systematic review of black-box uncertainty estimation (UE) methods for large language models (LLMs) and benchmarked 24 representative methods across four models an...
The paper identifies a gap in existing black-box UE research for LLMs, noting that many mainstream LLMs are accessed via restricted APIs that hide internal signals like logits. To address this, the authors built a unified evaluation framework and released benchmark data to enable reproducible com...