Status and errors
Every gRPC handler returns a typed envelope:
message Error {
string code = 1;
string message = 2;
map<string, string> details = 3;
}code is from a closed set — the protobuf field is a string so
future codes can be added forward-compatibly, but the documentation
below is the contract; clients SHOULD branch on code and surface
message + details verbatim.
CLI renders these as Error: <code>: <message> on stderr and exits
non-zero. See
internal/controlplane/grpc and
internal/runtime/compose for the
constructors.
Error codes (closed set)
| code | when |
|---|---|
cluster_uninitialized | RPC arrived before Cluster.Init / Cluster.Join completed |
cluster_already_initialized | Cluster.Init on a node that already has raft state |
node_already_member | Cluster.Join against an already-joined node, or hostname collision |
no_leader | leader unreachable; election in progress; client may retry |
quorum_lost | minority side of a partition; writes rejected |
token_invalid | bearer token does not match any known operator token |
token_revoked | matching token, but revoked_at is set |
validation_failed | manifest schema violation; details include field |
unknown_service | jaco service not present in compose |
unknown_host | hosts[*] not a cluster member |
unknown_network | service-level network not in top-level networks: |
reserved_port | compose service publishes 80 or 443 |
legacy_compose_field | compose file uses a v1/v2 spelling dropped from the modern spec; details.field names the offender, details.modern_equivalent names the replacement |
env_file_unresolved | daemon received a compose document still carrying env_file: (CLI failed to fold it client-side, e.g. an old CLI talking to a new daemon) |
parse_failed | jaco.yaml failed to unmarshal (yaml syntax, etc.); details include the underlying error |
replicas_exceed_pinned_hosts | placement: hosts with too few hosts for the requested replicas |
image_pull_failed | runtime gave up after retries (still emits backoff state per attempt) |
cert_failed | ACME issuance failed past retry budget |
docker_error | docker daemon refused or errored (disk full, daemon stopped, etc.) |
isolation_unavailable | node could not bring up nftables ruleset; replicas not scheduled here |
isolation_self_test_failed | startup self-test of nftables ruleset failed |
subnet_pool_exhausted | IPAM pool ran out of /24s |
port_conflict | two services in one deployment publish the same host port, or a deployment publishes a host port another deployment already owns cluster-wide |
deployment_not_found | jaco rollback / jaco delete against a deployment the cluster has no record of |
no_previous_revision | jaco rollback against a deployment that has never been rolled forward; nothing to revert to |
upgrade_verification_failed | minisign signature or SHA-256 checksum mismatch in self-upgrade |
upgrade_failed | post-upgrade health check failed; rollback executed |
internal | unrecoverable daemon error not better-categorized; details include reason |
message is human-readable; details is a flat string-to-string map
of structured context (e.g. {service: web, field: replicas} for a
validation failure, {attempt: 7, next_retry_at: …} for an image-pull
backoff).
Replica states (closed set)
ReplicaObserved.state from
proto/jaco/v1/entities.proto:
| state | meaning |
|---|---|
pending | ReplicaDesired received; image not yet pulled. A failing pull stays here with code: image_pull_failed (and the error in details.reason) while it retries |
pulling | image pull in progress (reported on the first attempt) |
running | container up; first healthy healthcheck observed. For a service with no compose healthcheck: block (or healthcheck: { disable: true }, which v0.3.2 treats as "no healthcheck"), the fallback path requires the container's docker inspect .State.Status to be running for HealthyConsecutiveCount = 5 consecutive ~1 s polls before flipping. The poll counter survives reconciler re-dispatches for the same (replica_id, container_id) pair (issue #152) — pre-v0.3.2 every re-dispatch reset the counter to 0, so healthcheck-less replicas in a stack with depends_on cascades got stuck in pending forever and blocked their dependents |
degraded | State.Health.Status = unhealthy observed |
updating | set by the scheduler during a rolling update; runtime reads but doesn't write |
failed | terminal error: image_pull_failed, docker_error, restart_exhausted (scheduler-driven) |
stopped | replica removed from desired set; container stopped + removed |
A failed replica is not retried automatically beyond the
3-consecutive-restart budget; it requires a fresh Deploy.Apply
(which increments revision and resets state).
Deployment status
DeploymentStatus:
| status | meaning |
|---|---|
pending | scheduling cannot proceed; status_details carries reason and supporting fields |
active | every desired replica has converged |
Node status
NodeStatus:
| status | meaning |
|---|---|
joining | post-Cluster.Join, pre-isolation-ready |
ready | nftables loaded + self-test passed; eligible for scheduling |
isolation_unavailable | nftables ruleset failed; node refuses container scheduling; other nodes skip it |
drain_timeout | jaco node remove aborted after the 5-minute per-replica drain timeout |
jaco cluster status and jaco node list render the trimmed names
(no NODE_STATUS_ prefix).
Rollout state
RolloutState (per service undergoing a rolling update):
| state | meaning |
|---|---|
in_progress | the scheduler is advancing one replica at a time |
completed | all steps applied; the new revision is steady-state |
aborted | a step timed out; previous revision continues to serve |
Audit event types
AuditEventType lives at
proto/jaco/v1/entities.proto:
apply, delete, rollback, node_join, node_remove,
token_issue, token_revoke, certificate_issued,
certificate_renewed, certificate_failed,
isolation_ruleset_reconciled, isolation_unavailable,
backup_taken, restore_completed, upgrade_succeeded,
upgrade_failed, rollout_invariant_hold,
registry_credential_upsert, registry_credential_remove,
privileged_workload_admitted, rebalance_moved,
rebalance_skipped.
Tag 22 (rebalance_dry_run) is reserved — the rebalancer was
simplified to always-on; the type is gone but the tag stays so
historical audit blobs decode cleanly. See
Scheduling → Pressure-based rebalancing → Observability
for the rebalance payload shape.
jaco audit --type <name,…> filters on the short forms. See
jaco audit.