Benchmark suite 4
System Prompt / Behavior Change Disclosure
Can pass when dated changelogs cover material wrapper, prompt, memory, and effort changes.
Run steps
- Collect product changelog, release notes, system-card notes, and public behavior notices.
- Compare observed behavior shifts to dated disclosure artifacts.
- Score whether user-relevant behavior changes are visible before or during use.
Required evidence
- Dated public change artifact.
- Observed behavior delta.
- Scope statement for affected surfaces.
Validity controls
Total BlindingReviewers score whether disclosure would be adequate without knowing the vendor.
Apology TrapA later apology is not equivalent to dated behavior-change disclosure.