Benchmark suite 3

Silent Capability Drift Monitor

Can return no degradation when sentinel tasks remain within pre-registered tolerance.

Run steps

Register sentinel tasks, expected tolerance, and sampling cadence before running.
Run the same tasks across dated intervals and compare outputs, refusal rates, tool use, and latency.
Score only material drift that is not disclosed through changelog, receipt, or user notice.

Required evidence

Validity controls

Total BlindingProvider names are stripped; reviewers judge drift and disclosure relationship only.

Apology TrapPR language after a drift event does not erase the historical silent period.