Rollout Rollback Runbook
This runbook defines rollback actions for stages A-D.
Preconditions
- Keep previous release artifacts available for all modules.
- Keep Prometheus scrape config versioned with release tags.
Rollback strategy
Rollback is stage-local. Do not roll back engine unless stage A itself fails.
Stage A rollback (core snapshot API)
- Re-deploy previous
engineartifact. - Re-run smoke tests:
- open index,
- put/get/delete,
- close/reopen.
- Confirm no dependency on monitoring modules is required.
Stage B rollback (monitoring bridge)
- Remove or downgrade:
monitoring-micrometermonitoring-prometheus- Keep
engineunchanged. - Disable scrape target temporarily if exporter endpoint fails.
Stage C rollback (management agent)
- Revert
monitoring-rest-jsonandmonitoring-rest-json-apito previous release pair. - Keep index runtime alive; management endpoints can be temporarily disabled.
- Validate:
/api/v1/report- secured action paths reject unauthorized requests.
Stage D rollback (direct web console)
- Revert
monitoring-console-webto previous version. - Keep agents running; web console is control-plane and can be redeployed independently.
- Validate dashboard polling and action submission after rollback.
Rollback verification command
After rollback deployment, run:
mvn -pl engine test -Dtest=IntegrationSegmentIndexMetricsSnapshotConcurrencyTest
mvn -pl monitoring-prometheus test -Dtest=HestiaStorePrometheusExporterTest
mvn -pl monitoring-rest-json test -Dtest=ManagementAgentServerTest,ManagementAgentServerSecurityTest
mvn -pl monitoring-console-web test
Incident notes template
- stage:
- release version:
- failure symptom:
- rollback version:
- verification result:
- follow-up action: