Benchmark suite 1

Model Identity Disclosure

Can return full pass when exact model identity or signed receipt is visible on every sampled run.

Run steps

Pre-register product surface, account tier, region, model requested, and model returned field.
Run fixed prompts that ask for model identity, then inspect UI/API metadata and route receipts.
Hash response and metadata artifacts, then score exactness, timing, and user visibility.

Required evidence

Validity controls

Total BlindingReviewers should score exactness of identity disclosure with provider names stripped.

Apology TrapA promise to disclose model names later does not improve a run score.