Benchmark suite 4

System Prompt / Behavior Change Disclosure

Can pass when dated changelogs cover material wrapper, prompt, memory, and effort changes.

Confidence: low. Scoring: 0-5 disclosure completeness score. This suite can return no problem found.

Run steps

  1. Collect product changelog, release notes, system-card notes, and public behavior notices.
  2. Compare observed behavior shifts to dated disclosure artifacts.
  3. Score whether user-relevant behavior changes are visible before or during use.

Required evidence

  • Dated public change artifact.
  • Observed behavior delta.
  • Scope statement for affected surfaces.

Validity controls

Total BlindingReviewers score whether disclosure would be adequate without knowing the vendor.
Apology TrapA later apology is not equivalent to dated behavior-change disclosure.