Operations

Running a supervised TSUNAGI node

TSUNAGI runs under guardiand, a supervisor that captures evidence, restarts on crash, trips a crash-loop breaker, and recommends rollback — so day-to-day operation is mostly observation. ← Docs

Live state

The Status dashboard shows everything live (truth-first; unavailable data is shown as unavailable). On the relay, a one-minute cron publishes status.json and a daily summary is generated:

tsunagi-daily.sh            # canonical count, streak, guardian state, KES days, uptime, incidents
guardianctl guard status   # supervisor state + supervised pid
guardianctl incidents list # any captured crash incidents
tail ~/p6-soak.log         # 5-min health timeline

guardianctl

guardianctl guard status        # running | HALTED, supervised pid, breaker state
guardianctl guard up | down     # start/stop supervision (asks YES)
guardianctl health              # pass/warn checks vs the proven-good baseline
guardianctl diagnose            # scan recent log for known KINTSUGI signatures
guardianctl incidents show ID   # an incident's evidence (meta + panic)
guardianctl versions            # current binary + rollback chain + snapshots

Guardian auto-restarts (availability) but never auto-rolls-back — the node holds keys, so a binary swap is always an operator decision.

Rollback & recovery

guardianctl versions            # see the sha-verified rollback chain
guardianctl rollback            # interactive, sha-verified rollback (env/keys/state untouched)
# cutover rollback (leave supervision):
guardianctl guard down && ~/start-relay-forge.sh

Every deployed binary is preserved in the rollback chain with its sha256. Rollback never touches environment, keys, or opcert state.

KES expiry monitoring

An expired KES key stops forging silently — so the status feed carries a live countdown (kes_days_left, alert tiers: ok / warn ≤14d / critical ≤5d), surfaced on the dashboard. Renew the KES key and update KES_START_PERIOD before the countdown reaches zero.