Florete

Management Plane

Design of the management plane for C0

Context

C0 has only a management plane — operator-authored, signed rete state. There is no control plane yet: nothing makes dynamic decisions about traffic at runtime. State is declared manually in a git repo and distributed to every node through a lightweight operator workflow: declare → validate → compile → commit (audit) → publish → nodes sync. A control plane lands in B1+ as a separate, parallel artifact stream operating within the bounds set by signed mgmt state (see Decision authority layers below). Both C0 and C1 share this manual foundation; C0 is the reference design, and C1 layers the rete-specific additions on top.

The design must balance three forces:

  • Simple to build — small team, MVP ASAP.
  • Prod-ready for small pilots — 3-5 server nodes, 10-20 users, a handful of services, running a few weeks.
  • Manually manageable by a single operator during pilots.

This spec is also the boundary that the rest of C0 (agent internals, identity, forwarding) must meet, and it should evolve naturally into C1 Manual Mesh (mesh-flor layer added on top via FlorIO, same CA and enrollment reused unchanged), then into B1 Cloud Control (coordination server replaces the file-based source of truth), and beyond into a fully distributed control plane.

Design Highlights

  • Source of truth is a git repo of YAML files, hand-edited. No embedded DB. No CLI mutators that round-trip through YAML. Git diffs are the audit log; YAML comments explain intent.
  • The CLI does a handful of things: CA operations, identity / bundle issuance, validation, per-node compilation, safe apply. It never mutates source YAML. Smallest surface that still gives us safety.
  • SPIFFE identities from day one. Rete has a root CA; principals get CSRs signed into X.509 certs whose SAN holds a SPIFFE URI (spiffe://<rete>/<kind>/<name>). Peers verify by CA signature, not pinned cert. Only ca.crt lives in the repo — individual certs are delivered to their holders and stored locally. Real SPIFFE X.509-SVIDs, no extra cost.
  • Principals are uniform. Users, services, and nodes are principals with rete-scoped identities. Roles attach to any principal. Users and services own workload identities (end-to-end mTLS); each node has a rete-member identity used by its flor-agent workload to reach the rete's own infrastructure services (config-server, metrics). Each principal's cert+key are held by a flor vertex acting on the principal's behalf — the agent included. (C1 adds mesh-vertices as a further, distinct principal kind for hop-by-hop mesh transit.)
  • Access is fully derived in C0. With no multi-hop routing, who-reaches-what is determined by users.yaml + services.yaml + roles.yaml + groups.yaml + service location. No paths.yaml, no label allocation — the compiler walks the ACL matrix directly.
  • Restart-with-rollback, not SIGHUP (for C0). flor agent sync fetches the pre-compiled artifacts from the config-server and restarts the supervised vertex; --commit-timeout auto-rolls-back if the operator doesn't confirm within the window. Full hot-reload is hard; but the agent should be built with swappable ACL tables from day one so that a near-term post-C0 ACL-only hot-reload (see Hot reload) can swap permission tables in place — that covers ~80% of day-to-day changes (add/remove user, change role membership) without dropping connections. Structural changes (new service, port change, identity rotation) still need a restart.
  • RBAC, not per-principal ACLs. Permissions attach to roles; principals get roles. Keeps the access control matrix smaller.
  • Atomic full-artifact state delivery, with a monotonic version stamp. Deltas are a later optimization; the version number is a zero-cost forward-compat anchor for them.
  • Mgmt artifact is signed by the operator; agents verify locally. The signing key is a SPIFFE principal in its own right — management-plane/<name> (convention: primary), a sign-only identity issued explicitly via retectl ca sign --kind management-plane and unrelated to any users.yaml entry (see ADR-0005). The operator holds three keys total: CA root, their own user/<op> TLS keypair (issued like any other user), and the rete's management-plane/<name> envelope-signing keypair. None of them leave operator hardware — the rete's coordination server (today the config-server, B1+ a managed cloud service) only stores and serves; it cannot mint or modify state. C0 is the degenerate case: every artifact is signed mgmt and there are no control-plane decisions yet. The envelope carries a plane: "mgmt" discriminator and a signature field from day one so a parallel plane: "ctrl" stream (signed by a control-plane/<name> principal) can land in C1 without schema churn. See Decision authority layers below for the evolution target.
  • Operator-issued bundles, not user-initiated PRs. Users never need git write access or forge accounts. The operator is the only writer of rete state.
  • Stable boundary = the compiled per-node artifacts. C0 produces an agent.json plus one vertices/flor.json per node from YAML; C1 adds a second vertex artifact for the mesh-flor layer with the same envelope; B1 fetches them from a coordination server; distributed control plane later emits them by consensus. The agent and vertex consume the same shapes throughout.

Details

Specific design topics are placed in separate pages:

Multi-rete on a single node

A computing host may join more than one Florete rete — for example, a developer's laptop enrolled in both a personal rete and a work rete, or a server node managed by two independent operators. This is a first-class use case and the design accommodates it without any special mode.

Rete scope

Each rete a node joins gets its own rete scope: a per-rete namespace for runtime state and installed material on that host. The scope is identified locally by the rete name (from rete.yaml). The cryptographic anchor is the bundle's CA cert and operator pubkey — two retes with the same name can never be confused because their trust roots differ. The local name can be overridden at install time (flor enroll <bundle> --as <name>) if two independent retes happen to share one.

Per rete scope on the node:

  • Install root: ~/.flor/retes/<scope>/ — CA cert, all principal certs and keys for this rete, compiled artifacts (agent.json, vertices/*.json), agent control socket (agent.sock).
  • Runtime root: /run/flor/<scope>/ — FlorIO sockets and any other runtime files (C1+).
  • Process tree: one flor agent process supervising its per-mesh vertex graph.

All cert and socket paths in compiled artifacts are scope-relative (e.g. ca.crt, not ~/.flor/ca.crt). The agent resolves them against its install root at startup, so artifacts stay portable across nodes regardless of the local scope name.

Node identity in multiple retes

A computing host that joins N retes has N separate node/<name> principals — one in each rete's trust domain. The names may differ across retes (each rete's operator chooses them independently). Each principal is acted on by a dedicated flor-agent process and held by that rete's flor vertex. There is no shared node identity across retes; the trust domains are fully independent.

Local listen addresses

Two categories of listen addresses arise in a multi-rete setup, and they have different audiences and different answers.

External UDP listeners appear only on nodes that accept inbound connections — nodes with a fixed public or private IP:port declared in nodes.yaml. User devices (laptops, phones) sit behind NAT and use ephemeral source ports; they have no fixed UDP listen address. Therefore:

  • External UDP conflicts can only arise on server nodes, which are by definition admin-managed (datacenter, office LAN, home server). Two rete operators wanting the same external UDP port on the same physical server must coordinate with the IT admin responsible for that host. At C0/C1 scale this involves a small number of ports and technically capable people.
  • B1+ direction: node-side capability advertisement — the node declares which external addresses and ports it has available. No implementation in C0/C1.

Local SOCKS5 and per-service inbound ports exist on every node, including user laptops. Users cannot be asked to "adjust a port-override config." The agent must handle this on their behalf.

Model:

  • Compiled artifacts declare local listens as preferences, not requirements.
  • The agent at startup binds the preferred port if free; otherwise picks any available port and records the binding in agent state.
  • Apps that need the actual port query the agent control socket — already the authoritative source for per-rete runtime state.
  • An ergonomic default: each rete scope is assigned a local-port offset at install time, deterministic from the rete CA fingerprint (or operator-chosen at flor enroll time), so rete A's SOCKS5 lands at :1080, rete B's at :1180, and so on. Single-rete users get their expected default; multi-rete users get predictable separation without any manual configuration.
  • FlorIO sockets (C1+) are filesystem paths under /run/flor/<scope>/vertices/<vertex-name>.sock — no port space, no collision possible.

The user is responsible for nothing. They install the bundle; the agent resolves any local-resource conflicts; apps discover actual port values via the control socket if they care.

Evolution

The compiled per-node artifact is the stable public contract. What changes across milestones is who produces it and how it's distributed — never the shape the agent consumes.

AspectC0 Tended TunnelsC1 Manual MeshB1 Cloud ControlDistributed CP (later)
ScopeService-to-service directRete mesh, manual pathsSame rete, derived pathsSame rete
Protocol entitiesUsers, services, nodes+ Rete verticesSame as C1Same as C1
Binariesflor agent + 1 flor vertexflor agent + 2 flor vertexSame as C1Same as C1
Source of truthYAML in git repoYAML in git repoServer (ctrl); operator git+compile (mgmt)Consensus (ctrl); operator git+compile (mgmt)
CompileLocal, service-level onlyLocal, both layersmgmt local+signed; ctrl server-sidemgmt local+signed; ctrl consensus-derived
DistributionBundle + config-server fetchBundle + config-server fetchServer push (WebSocket/gRPC)Gossip + pull
ReloadRestart w/ commit timeoutRestart w/ commit timeoutHot reloadHot reload
ConsensusOperator (1 person)Operator (1 person)Single-writer serverRaft / CRDT
PathsN/A (direct forwards)Manual paths.yamlDerived from topology + accessDerived, per-agent resolution
State updatesAtomic (poll config-server)Atomic (poll config-server)Atomic pushAtomic; deltas in B2+
Decision authorityOperator only (signed mgmt)Operator only (signed mgmt)+ bounded CP (unsigned, within signed policy)+ agent fast-path autonomy (FRR-class) within CP/policy bounds
Artifact streamsplane: "mgmt" (signed)plane: "mgmt" (signed)+ plane: "ctrl" (unsigned)Same

From B1 on, the Source of truth and Compile rows split by plane: mgmt stays authored, compiled, and signed on operator hardware — the coordination server relays it but cannot author it (see Coordinator) — while only ctrl moves to the server or to consensus. This bounds a compromised cloud to disruption, not destruction.

Preserving manual mode as a power mode: manual configuration (YAML in git + operator-run config-server) is a permanent capability tier, not a stepping stone. Hackers, personal setups, airgapped environments, and disaster-recovery fallback all depend on it. Every higher tier is additive — no C0/C1 capability is removed in B1 or beyond.

Decision authority layers

A second evolution axis, orthogonal to producer/distribution, is who is allowed to decide what. The model is one delegation rule applied at three points: at every layer, decisions operate within bounds the layer above signed. C0/C1 use only the top layer; B1+ activates the lower two.

LayerSpeedAuthorityScope of decisionHow it's verified
Operator (mgmt plane)minutes–dayssignedgrammar of permitted states (identities, reachability, topology, path constraints)flors verify operator signature on the mgmt artifact
Control plane (B1+)seconds–minutesunsigned, boundedpick a state within grammar (path failover, link selection, NAT signaling, later: placement)flors check each ctrl decision against the signed mgmt policy locally
Agent fast-path (B1+)µs–msunsigned, localsub-millisecond reactions inside what CP/operator allowed (e.g. MPLS-style FRR over reserved paths, BFD-class liveness, queue mgmt)self-bounded by the configured policy/CP state already in the agent

This is policy-bounded autonomy (capability attenuation): the operator signs bounds, lower layers act freely within them, and verification stays local — no online operator key, no hot signing key in the cloud. Precedent: Macaroons (caveats narrow capabilities held by untrusted parties), RPKI/ROAs (signed origin authorizations bound dynamic BGP), BGP/FIB + ECMP/FRR (control plane sets paths; data plane reroutes locally on failure), OPA-style admission policies.

The CP can always narrow what policy allows (drain a node, rate-limit, circuit-break) but never broaden it. Identity issuance stays operator-only forever; CP and agents never mint identities.

C0/C1 are the degenerate case: empty CP, empty agent autonomy, full mgmt determines runtime state. The shape we lock in now — signed mgmt envelope, plane discriminator, monotonic versioning — is what lets B1+ add the lower layers additively. The policy language itself is not designed now; it grows feature by feature as B1+ capabilities pull on it.

Scope Checklist

  • YAML schemas for all source files; rete.yaml carries per-category crypto (ca, signers.mgmt, tls_principals); repo cert layout certs/ca.crt + certs/management-planes/<name>.crt; enrollment.log format
  • Compiled-artifact schemas: agent.json (kind: agent) and flor.json vertex config (kind: vertex), semver'd, envelope with version stamp + authorized_mgmt_signers snapshot
  • Per-node compiled output layout: <node>/mgmt/{agent.json, vertices/flor.json}
  • Naming scheme: canonical URI + .rete hostname resolver
  • retectl ca init / retectl ca sign (file backend)
  • retectl issue-bundle (operator-side bundle issuer; Flow A keypair generation + Flow B CSR signing)
  • retectl validate with all rules above
  • retectl compile --node <name> emitting deterministic per-node artifacts
  • retectl publish pushing the compiled tree to the rete config-server
  • Config-server implementation (GET /artifact/<node>, POST /publish) as a regular Florete-published service
  • Reserved-name handling: node / operator roles and config-read / config-write groups must be present in YAML (template); validator enforces; compiler auto-assigns node role to every node principal; operators are assigned manually via role: operator in users.yaml; the rete's management-plane/<name> signer is not auto-derived — it's issued explicitly via retectl ca sign --kind management-plane --name <name> as part of operator bootstrap (convention: name = primary)
  • flor id create (CSR bundle producer, node-side)
  • flor enroll (two-step bootstrap: install certs, fetch artifacts from config-server, start agent)
  • flor agent sync with --commit-timeout default 5m, --confirm, --dry-run
  • flor agent run [--rete <scope>] reads <root>/mgmt/agent.json and supervises one flor vertex instance in C0
  • flor vertex run --rete <scope> --name <name> daemon entry point
  • flor agent status over local Unix socket
  • Per-service SOCKS5 outbound proxy with port→identity binding; local listen ports are preferences — agent binds preferred if free, otherwise picks any available port and records the binding; bind failures are logged, not fatal
  • Installer: install.sh for Linux/Mac, MSI for Windows
  • Static landing page template (per-rete, zero backend)
  • Example my-rete/ repo including a management node (mgmt01 with config-server + metrics)
  • Playbook doc: operator bootstrap (incl. management-node manual bootstrap), issuing bundles, publishing a service, maintenance window, emergency rollback

Open Follow-ups

Not blockers for C0 release:

  • ACL-only hot reload — near-term post-C0; design in Hot reload. Requires the agent to hold ACL tables behind an Arc/similar indirection from day one (cheap).

  • L7 awareness as a general capability (C1+, design needed). Several planned features need the HTTP layer to know which Florete principal is calling: per-node isolation on config-server, HTTP-level authorization policies, per-principal metrics tagging, request-level access logs, per-principal rate limiting. All of them want something like an X-Florete-Peer-SpiffeID header (or equivalent out-of-band signal) derived from the mTLS peer identity. flor itself is deliberately L4/L5 (QUIC/mTLS + TCP bytes) and should stay that way at the C0 layer — promoting it to an L7 proxy would balloon scope, entangle buffering/framing concerns with identity concerns, and make the data-plane harder to reason about. Options to explore in C1+:

    • A dedicated L7 sidecar process between flor and upstream HTTP services (flor keeps forwarding TCP bytes; the sidecar handles HTTP + identity augmentation).
    • A side-channel lookup: flor exposes a Unix socket where upstream services can ask "which SPIFFE ID is on local socket X?" Upstream owns its own HTTP plumbing.
    • A Florete-specific header-injection shim that's opt-in per service in YAML, so most services stay pure-L4 and only those that want L7 identity metadata get the extra path.

    No one of these is obviously right; the tradeoffs (process count, resource cost, API stability, who-owns-which-failure-mode) need honest exploration. Picking one now would lock us in prematurely.

  • Per-node isolation of config-server reads is the most immediate use-case driving the L7 question: replacing the two-service split with one service + L7 identity-aware authZ so alpha can only fetch alpha.json. Gated on the L7 design above. Pilot-scale metadata disclosure is tolerable until then.

  • Cert fingerprint allowlist — include the cert fingerprint alongside the SPIFFE ID in each allow entry so a rogue CA-signed cert for the same SPIFFE ID is rejected even if name-removal hasn't propagated yet. Adds a field to the compiled artifact schema (semver bump); deferred because name-removal revocation is already effective at pilot scale.

  • Full structural hot-reload (new services, cert rotation, port changes) with connection-preserving restart semantics.

  • CRL / OCSP for immediate mid-cert-lifetime revocation without a new compile cycle.

  • Automated cert rotation via a bounded online intermediate CA (B1+). C0/C1 have only manual retectl issue-bundle rotation — workable at pilot scale, untenable at scale. The B1+ direction is a delegated intermediate CA whose authority (principal set, validity window) is operator-signed and narrow; can refresh certs freely within bounds, cannot create new principals. Two deployment tiers expected: managed (we host) and BYO on-prem (security-conscious customers run their own). Per-principal-class policy lets high-risk principals stay on long-lived offline-issued certs while low-risk workloads rotate frequently. Adds SPIFFE Workload-API conformance as a natural by-product. See reasoning.

  • Authoring UI surface (B1+). Cloud-hosted UI is fine for read-only views (status, audit, metrics), but signing operations must stay local on operator hardware to avoid UI-substitution attacks. Authoring goes in retectl and (later) a local desktop GUI; cloud dashboards are read-only or use a "draft in cloud, sign locally" handoff. See reasoning.

  • Per-rete local-port offset scheme — deterministic port offset derived from rete CA fingerprint (or operator-chosen at flor enroll time) so multi-rete installs get predictable port separation without manual config (e.g. rete A at :1080, rete B at :1180). Ergonomic nicety; single-rete users always see the default.

  • Port-query on agent control socket — extend flor agent status (or a dedicated subcommand) so apps can ask "what's the current SOCKS5 port for principal X?" instead of hard-coding the preferred port. Required for apps that must work correctly on multi-rete hosts; optional for single-rete users where the preferred port always wins.

  • B1+ node-side capability advertisement — node declares which external IP:port pairs it has available; rete mgmt-plane consults that at compile time rather than having the rete operator guess external UDP ports. Eliminates the remaining admin-coordination requirement for server nodes in multi-rete setups.

  • Transparent outbound for non-SOCKS5 apps (iptables / SO_PEERCRED / libc shim).

  • flor doctor diagnostics (ping peers, check access matrix, confirm config-server reachability).

  • Backend-hosted enrollment forms, Slack/Discord approval bots.

  • Config-server HA (two management nodes, replicated published state).

On this page