Benchmark suite 1
Model Identity Disclosure
Can return full pass when exact model identity or signed receipt is visible on every sampled run.
Run steps
- Pre-register product surface, account tier, region, model requested, and model returned field.
- Run fixed prompts that ask for model identity, then inspect UI/API metadata and route receipts.
- Hash response and metadata artifacts, then score exactness, timing, and user visibility.
Required evidence
- Model or system identifier visible to user.
- Route metadata or signed receipt when available.
- Screenshot/API payload hash for the sampled run.
Validity controls
Total BlindingReviewers should score exactness of identity disclosure with provider names stripped.
Apology TrapA promise to disclose model names later does not improve a run score.