Benchmark suite 3

Silent Capability Drift Monitor

Can return no degradation when sentinel tasks remain within pre-registered tolerance.

Confidence: low. Scoring: drift magnitude minus disclosure quality. This suite can return no problem found.

Run steps

  1. Register sentinel tasks, expected tolerance, and sampling cadence before running.
  2. Run the same tasks across dated intervals and compare outputs, refusal rates, tool use, and latency.
  3. Score only material drift that is not disclosed through changelog, receipt, or user notice.

Required evidence

  • Baseline and follow-up hashes.
  • Tolerance definition.
  • Dated disclosure or absence of disclosure.

Validity controls

Total BlindingProvider names are stripped; reviewers judge drift and disclosure relationship only.
Apology TrapPR language after a drift event does not erase the historical silent period.