Benchmark suite 3
Silent Capability Drift Monitor
Can return no degradation when sentinel tasks remain within pre-registered tolerance.
Run steps
- Register sentinel tasks, expected tolerance, and sampling cadence before running.
- Run the same tasks across dated intervals and compare outputs, refusal rates, tool use, and latency.
- Score only material drift that is not disclosed through changelog, receipt, or user notice.
Required evidence
- Baseline and follow-up hashes.
- Tolerance definition.
- Dated disclosure or absence of disclosure.
Validity controls
Total BlindingProvider names are stripped; reviewers judge drift and disclosure relationship only.
Apology TrapPR language after a drift event does not erase the historical silent period.