Skip to content

Rollout Rollback Runbook

This runbook defines rollback actions for stages A-D.

Preconditions

  • Keep previous release artifacts available for all modules.
  • Keep Prometheus scrape config versioned with release tags.

Rollback strategy

Rollback is stage-local. Do not roll back engine unless stage A itself fails.

Stage A rollback (core snapshot API)

  1. Re-deploy previous engine artifact.
  2. Re-run smoke tests:
  3. open index,
  4. put/get/delete,
  5. close/reopen.
  6. Confirm no dependency on monitoring modules is required.

Stage B rollback (monitoring bridge)

  1. Remove or downgrade:
  2. monitoring-micrometer
  3. monitoring-prometheus
  4. Keep engine unchanged.
  5. Disable scrape target temporarily if exporter endpoint fails.

Stage C rollback (management agent)

  1. Revert monitoring-rest-json and monitoring-rest-json-api to previous release pair.
  2. Keep index runtime alive; management endpoints can be temporarily disabled.
  3. Validate:
  4. /api/v1/report
  5. secured action paths reject unauthorized requests.

Stage D rollback (direct web console)

  1. Revert monitoring-console-web to previous version.
  2. Keep agents running; web console is control-plane and can be redeployed independently.
  3. Validate dashboard polling and action submission after rollback.

Rollback verification command

After rollback deployment, run:

mvn -pl engine test -Dtest=IntegrationSegmentIndexMetricsSnapshotConcurrencyTest
mvn -pl monitoring-prometheus test -Dtest=HestiaStorePrometheusExporterTest
mvn -pl monitoring-rest-json test -Dtest=ManagementAgentServerTest,ManagementAgentServerSecurityTest
mvn -pl monitoring-console-web test

Incident notes template

  • stage:
  • release version:
  • failure symptom:
  • rollback version:
  • verification result:
  • follow-up action: