Status and errors

Every gRPC handler returns a typed envelope:

message Error {
  string code = 1;
  string message = 2;
  map<string, string> details = 3;
}

code is from a closed set — the protobuf field is a string so future codes can be added forward-compatibly, but the documentation below is the contract; clients SHOULD branch on code and surface message + details verbatim.

CLI renders these as Error: <code>: <message> on stderr and exits non-zero. See internal/controlplane/grpc and internal/runtime/compose for the constructors.

Error codes (closed set)

codewhen
cluster_uninitializedRPC arrived before Cluster.Init / Cluster.Join completed
cluster_already_initializedCluster.Init on a node that already has raft state
node_already_memberCluster.Join against an already-joined node, or hostname collision
no_leaderleader unreachable; election in progress; client may retry
quorum_lostminority side of a partition; writes rejected
token_invalidbearer token does not match any known operator token
token_revokedmatching token, but revoked_at is set
validation_failedmanifest schema violation; details include field
unknown_servicejaco service not present in compose
unknown_hosthosts[*] not a cluster member
unknown_networkservice-level network not in top-level networks:
reserved_portcompose service publishes 80 or 443
legacy_compose_fieldcompose file uses a v1/v2 spelling dropped from the modern spec; details.field names the offender, details.modern_equivalent names the replacement
env_file_unresolveddaemon received a compose document still carrying env_file: (CLI failed to fold it client-side, e.g. an old CLI talking to a new daemon)
parse_failedjaco.yaml failed to unmarshal (yaml syntax, etc.); details include the underlying error
replicas_exceed_pinned_hostsplacement: hosts with too few hosts for the requested replicas
image_pull_failedruntime gave up after retries (still emits backoff state per attempt)
cert_failedACME issuance failed past retry budget
docker_errordocker daemon refused or errored (disk full, daemon stopped, etc.)
isolation_unavailablenode could not bring up nftables ruleset; replicas not scheduled here
isolation_self_test_failedstartup self-test of nftables ruleset failed
subnet_pool_exhaustedIPAM pool ran out of /24s
port_conflicttwo services in one deployment publish the same host port, or a deployment publishes a host port another deployment already owns cluster-wide
deployment_not_foundjaco rollback / jaco delete against a deployment the cluster has no record of
no_previous_revisionjaco rollback against a deployment that has never been rolled forward; nothing to revert to
upgrade_verification_failedminisign signature or SHA-256 checksum mismatch in self-upgrade
upgrade_failedpost-upgrade health check failed; rollback executed
internalunrecoverable daemon error not better-categorized; details include reason

message is human-readable; details is a flat string-to-string map of structured context (e.g. {service: web, field: replicas} for a validation failure, {attempt: 7, next_retry_at: …} for an image-pull backoff).

Replica states (closed set)

ReplicaObserved.state from proto/jaco/v1/entities.proto:

statemeaning
pendingReplicaDesired received; image not yet pulled. A failing pull stays here with code: image_pull_failed (and the error in details.reason) while it retries
pullingimage pull in progress (reported on the first attempt)
runningcontainer up; first healthy healthcheck observed. For a service with no compose healthcheck: block (or healthcheck: { disable: true }, which v0.3.2 treats as "no healthcheck"), the fallback path requires the container's docker inspect .State.Status to be running for HealthyConsecutiveCount = 5 consecutive ~1 s polls before flipping. The poll counter survives reconciler re-dispatches for the same (replica_id, container_id) pair (issue #152) — pre-v0.3.2 every re-dispatch reset the counter to 0, so healthcheck-less replicas in a stack with depends_on cascades got stuck in pending forever and blocked their dependents
degradedState.Health.Status = unhealthy observed
updatingset by the scheduler during a rolling update; runtime reads but doesn't write
failedterminal error: image_pull_failed, docker_error, restart_exhausted (scheduler-driven)
stoppedreplica removed from desired set; container stopped + removed

A failed replica is not retried automatically beyond the 3-consecutive-restart budget; it requires a fresh Deploy.Apply (which increments revision and resets state).

Deployment status

DeploymentStatus:

statusmeaning
pendingscheduling cannot proceed; status_details carries reason and supporting fields
activeevery desired replica has converged

Node status

NodeStatus:

statusmeaning
joiningpost-Cluster.Join, pre-isolation-ready
readynftables loaded + self-test passed; eligible for scheduling
isolation_unavailablenftables ruleset failed; node refuses container scheduling; other nodes skip it
drain_timeoutjaco node remove aborted after the 5-minute per-replica drain timeout

jaco cluster status and jaco node list render the trimmed names (no NODE_STATUS_ prefix).

Rollout state

RolloutState (per service undergoing a rolling update):

statemeaning
in_progressthe scheduler is advancing one replica at a time
completedall steps applied; the new revision is steady-state
aborteda step timed out; previous revision continues to serve

Audit event types

AuditEventType lives at proto/jaco/v1/entities.proto:

apply, delete, rollback, node_join, node_remove, token_issue, token_revoke, certificate_issued, certificate_renewed, certificate_failed, isolation_ruleset_reconciled, isolation_unavailable, backup_taken, restore_completed, upgrade_succeeded, upgrade_failed, rollout_invariant_hold, registry_credential_upsert, registry_credential_remove, privileged_workload_admitted, rebalance_moved, rebalance_skipped.

Tag 22 (rebalance_dry_run) is reserved — the rebalancer was simplified to always-on; the type is gone but the tag stays so historical audit blobs decode cleanly. See Scheduling → Pressure-based rebalancing → Observability for the rebalance payload shape.

jaco audit --type <name,…> filters on the short forms. See jaco audit.

See also