Backups
Cluster state is held in the raft FSM: deployments, replicas, routes,
certs, tokens, the audit log, IPAM allocations, scheduler bookkeeping.
jaco backup writes a consistent snapshot of all of that to a single
tarball. jaco restore primes a fresh node from one.
CLI: jaco backup,
jaco restore.
What's in the tarball
snapshot.bin— the raw raft snapshot bytes fromraft.Snapshot().meta.json—{cluster_id, snapshot_index, snapshot_term, jaco_version, taken_at, leader_at_snapshot}.
The snapshot is consistent at a single raft commit index: every deployment, audit event, and cert that committed before that index is present; nothing committed after is. The cluster CA cert and key are included — restoring on a fresh host stands up the same cluster identity.
Container state on the original nodes is not in the tarball. After restore, the runtime on the restored cluster re-pulls images and re-creates containers per the desired state. Plan for the pull window.
Take a backup
export JACO_TOKEN=<operator_token>
export LEADER=node-1:7000
jaco backup --server $LEADER --output cluster-$(date +%F).tar.gz
# Wrote 41279 bytes to cluster-2026-05-25.tar.gzA schedule running on any operator host (cron, systemd timer) is the expected pattern. The RPC streams chunks; the CLI has a 5-minute deadline.
A backup_taken audit event is recorded with the resulting snapshot
index.
Store backups safely
The tarball includes the cluster CA private key and the SHA-256-hashed operator tokens. Treat it like a credential store:
- Encrypt at rest before uploading anywhere.
- Restrict access to the same humans who hold operator tokens.
- Keep a recent local copy and a remote copy; lose neither.
Plaintext operator tokens are NOT in the backup (only their hashes). After restoring on a fresh cluster, the original tokens still authenticate — they hash to the same values.
Restore on a fresh host
The receiving host MUST have:
jaco+jacodinstalled at a compatible version with the taken-at version (same major).- An empty
$JACO_DATA_DIR(default/var/lib/jaco). Restore refuses to overwrite an existing data dir. - The daemon stopped:
sudo systemctl stop jaco.
Then:
sudo systemctl stop jaco
sudo jaco restore --input cluster-2026-05-25.tar.gz --name $(hostname)
sudo systemctl start jaco
jaco cluster statusThe receiving node bootstraps the raft store from the snapshot, starts
as a single voter with the same cluster id, and emits a
RESTORE_COMPLETED audit event on first boot.
Rejoin the rest of the cluster
Other nodes rejoin via the usual flow:
# on the restored node
JACO_TOKEN=<operator_token> jaco node issue-join-tokenthen on each other node:
sudo jaco node join --peer <restored-node>:7000 --token <single-use>Once every node is back as READY, the cluster is fully restored.
Deployments, routes, certs, and IPAM allocations come back exactly as
they were at the snapshot's raft index.
Drill it
The first time you need a restore should not be in production. Drill the round-trip at least once before going live:
- Take a backup of a working cluster.
- On a separate host (fresh VM, container), run the restore +
systemctl startflow. - Confirm
jaco statusshows the deployments andjaco auditshows the historical events. - Wipe the drill host.