Operations

Running a supervised TSUNAGI node

TSUNAGI runs under guardiand, a supervisor that captures evidence, restarts on crash, trips a crash-loop breaker, and recommends rollback — so day-to-day operation is mostly observation. ← Docs

Observe

Live state

The Status dashboard shows everything live (truth-first; unavailable data is shown as unavailable). On the relay, a one-minute cron publishes status.json and a daily summary is generated:

tsunagi-daily.sh            # canonical count, streak, guardian state, KES days, uptime, incidents
guardianctl guard status   # supervisor state + supervised pid
guardianctl incidents list # any captured crash incidents
tail ~/p6-soak.log         # 5-min health timeline

Supervise (YAMORI / TATE)

guardianctl

guardianctl guard status        # running | HALTED, supervised pid, breaker state
guardianctl guard up | down     # start/stop supervision (asks YES)
guardianctl health              # pass/warn checks vs the proven-good baseline
guardianctl diagnose            # scan recent log for known KINTSUGI signatures
guardianctl incidents show ID   # an incident's evidence (meta + panic)
guardianctl versions            # current binary + rollback chain + snapshots

Guardian auto-restarts (availability) but never auto-rolls-back — the node holds keys, so a binary swap is always an operator decision.

Recover (TATE)

Rollback & recovery

guardianctl versions            # see the sha-verified rollback chain
guardianctl rollback            # interactive, sha-verified rollback (env/keys/state untouched)
# cutover rollback (leave supervision):
guardianctl guard down && ~/start-relay-forge.sh

Every deployed binary is preserved in the rollback chain with its sha256. Rollback never touches environment, keys, or opcert state.

Predict (KAGAMI)

KES expiry monitoring

An expired KES key stops forging silently — so the status feed carries a live countdown (kes_days_left, alert tiers: ok / warn ≤14d / critical ≤5d), surfaced on the dashboard. Renew the KES key and update KES_START_PERIOD before the countdown reaches zero.