Executable Boundary Contracts for Sound Event Traces
Sound event detection benchmarks compress timed boundary behavior into scalar scores (frame F1, segment F1, event F1) that hide interpretable boundary failures. Union activity scores can mask typed boundary errors, and no existing specification language makes trace obligations executable and auditable.
The paper defines executable boundary contracts for finite sound event traces with two sorts: a frame sort (bounded Boolean fragment embeddable in STL after grid projection) and an event sort (declared interval matching, duration clauses, fragmentation clauses, obligation-restricted vector scoring). The monitor evaluates contracts and returns a typed vector of coordinates rather than a single scalar. Lean 4 is used to check the finite frame core logic. The system is evaluated on controlled Mini LibriSpeech scenes, MAESTRO Real soundscapes, frozen pretrained timing probes, and an official DCASE 2024 Task 4 baseline.
Standard scores and contract coordinates disagree in interpretable ways across all evaluation tracks. The strongest finding is that union activity can hide typed boundary failure. The contract-augmented detector (contract_tcn_aug) achieves 0.889 Frame F1 and 0.829 Boundary F1 versus 0.656 and 0.408 for the best non-contract baseline (dilated_cnn).
| Detector | Frame F1 | Segment F1 | Event F1 | Boundary F1 | Logic |
|---|---|---|---|---|---|
| adaptive_energy | 0.504 | 0.568 | 0.232 | 0.310 | 0.514 |
| spectral_flux | 0.614 | 0.739 | 0.225 | 0.358 | 0.560 |
| temporal_cnn | 0.607 | 0.760 | 0.294 | 0.374 | 0.487 |
| dilated_cnn | 0.656 | 0.789 | 0.367 | 0.408 | 0.522 |
| contract_tcn_aug | 0.889 | 0.927 | 0.705 | 0.829 | 0.802 |
