High-Level Design

Florete technology is built of two intertwinted parts: the Florenetes orchestrator and the Florete network. The orchestrator runs over the Florete network, which provides a service mesh abstraction. The network's control plane is itself a distributed app managed by the orchestrator. Inter-cluster networking is implemented by Florete network as well, extended beyond in-cluster service mesh.

Florete Network Design

In this section we explore core concepts of the Florete network focusing mainly on its data plane. Control plane is covered in a separate section.

Recursive Architecture

Florete network is inspired by Recursive InterNetwork Architecture (RINA)¹. The key concept of RINA says "networking is communication", or more concrete: "networking is inter-process communication". And what is modern inter-process communication? Service mesh!²

Turns out, service mesh is essentially the basic primitive of recursive networks, if think in production software terms instead of academic parlance of the RINA theory.

Choice of terminology

As we are building a working technology, not a theory, we'd rather follow common IT terms when applicable. At the same time we avoid direct over-extending of well-known concepts beyond their scope as well as jargonisms of particular implementations.

Layer

Florete network is built of recursive layers. A layer is an abstraction of a service mesh. A service mesh is a service-to-service communication fabric. By the way, why is it called a mesh? Because each service can connect to any other one — hence it is fully connected mesh.

Florete layer is a communication mesh for its users. So, a user of the layer APIs can connect to any other user of that layer. Note that it doesn't mean all the connections actually exist. Nor that the topology of the layer itself is a mesh.

Vertex

A Florete layer is comprised of vertices. This is an abstraction of proxy components from service mesh implementations, like Istio³. A vertex provides APIs to its users and implements them, being participant of the layer. We call them Portal API, because it is en entry point into a layer. It contains three main methods:

Publish: publish an entity in a layer
Connect: connect to an entity in a layer
Discover: find entities in a layer

As a result, users of Portal API can establish connections between each other, and communicate.

Layer Diagram — Alice and Bob establish connection using Violet layer via Portal APIs provided by vertices V_a and V_b

Note how Portal::Publish method resembles Reverse Proxy role, and Portal::Connect — Forwarding Proxy.

A vertex implements Portal API, being a participant of a layer. Naturally, it is done by connecting to some other vertices within the layer. Here the recursive nature of the architecture shows itself: to connect, the vertex uses Portal::Connect of a vertex from some other layer!

Internally vertex consists of

transport endpoint to establish connections on behalf of the Portal API users
connection manager for in-layer connections with other vertices
forwarding engine to process packets in these connections: with local origin or destination, and transit

TODO(#35): Create detailed design of a recursive vertex

This is how a layer can be implemented upon Portal APIs provided by a few other layers.

Recursive Layers Diagram — Alice and Bob establish connection using Violet layer that uses Green and Cyan layers itself

Importantly, this isn't even a stack: in complicated cases vertices on a node can form directed graph with cycles.

TODO(#36): Explore practical non-stack configurations of Florete layers

Link

At the end of recursion there are primitive layers that are implemented upon something that's not a layer, some "physics", or external communication means / media.

The most primitive layer is formed by two communicating entities in a peer-to-peer fashion. We call it a link.

Link as basic layer

It is important to use link primitive with only 2 communicating vertices instead of multi-vertex one like a bus: "Buses ruined everything"⁴. Bus is a form of network that is not primitive, it is a "physical mesh". It can be built using two kinds⁵ of layers: primitive links between pairs of nodes and a layer (think "service mesh") on top.

Technically link vertex is a degenerate case of a recursive vertex: it sill has full-featured transport endpoint, but connection manager may be absent (if underlying medium is connectionless, e.g. UDP or some form of radio), and forwarding engine is trivial (because there is only one another vertex in the layer, there are no transit packets; hence, the forwarder is a mere datagram socket in this case).

This fact enables optimization: a signle link vertex can handle multiple links by maintaining a strict 1-1 mapping of Portal API users to other link vertices. When a user calls Portal::Connect to some target user, link vertex must know exactly at which link this target is published. This can be implemetned via maintaining star-like topology of links.

TODO(#37): Create detailed design of a link vertex

Service

Let's define Florete network service as an entity that is published (via Portal::Publish) in some layer. This is an abstraction of (micro)services in a service mesh. Any vertex from a recursive layer is a service, because it publishes itself in some other layers. Note that vertices of a link are not services themselves, because there is no other layer to publish them.

Consequences

What gives this simple construct of recursive service meshes? First of all, no network details are leaked through the APIs. Most importantly, there are no addresses, ports, and other kinds of implementation details.

This mere fact decouples connections from network topology and routing. This means exceptional freedoms: route aggregation (multi-route delivery; interface/link aggregation is a special case of it), multihoming, all mobility scenarios can be supported out of the box.

TODO(#38): Explore key Florete network features of route aggregation, multihoming, and mobility

Implementation Strategy

In this section we cover how we actually implement Florete network.

Recursive principle allows agile implementations

Florete abstractions allow more than one approach to implement, and even within a particular implementation there may be multiple ways of achieving similar goals, still conforming to the recursive principle of the Florete. In this case we should not strife for some globally best approach but rather admit there is a level of agility, and possibility to tailor implementation for particular context.

Integration with IP

How to bridge existing IP-based applications with Florete network? The answer is well-known: use per-node proxies. That's what Istio's Ambient Mesh⁶ is doing in Kubernetes (and other similar technologies). There may be even per-workload proxies (sidecar proxies) — but it is a rudiment of ad hoc design for the Kubernetes.

So, app developers can just use IP-based protocols, and at the proxy boundary they are translated into Florete connections.

This IP proxy application can be used not only locally — it can serve clients and servers in entire IP-based LAN. This may be useful to provide Florete connectivity to nodes that cannot have local proxies installed due to some reason. In this case, the proxy becomes a gateway from IP into Florete network.

Identities

The only thing that's used in layer's Portal API are identities.

Identity

Identity is a name and a cryptographic evidence that the name is owned by the entity.

This sentence formulates general framework for identitites, exceptionally simple yet powerful. There is a well-established concept of SPIFFE SVIDs⁷ for clusters. There are X.509 certificates (PKI-based) and DNS in public Internet. And there are self-authenticating names (e.g. .onion, .i2p)⁸. We're uniting of all of these schemes in our Identity API. Actual implementation should begin with DNS names for simplicity and integration with existing desktop tools.

TODO(#39): Define Florete Identity API

Addressing & Routing

What about addresses? That concept becomes private implementation detail of a particular layer. Actually, for small networks there is no need for addresses at all: names are enough, and routing can be done by other methods. The most well-known alternative to IP routing is MPLS⁹. It enables Traffic Engineering¹⁰ — the thing we need too. MPLS uses labels instead of addresses for forwarding packets.

TODO(#40): Define MPLS-based routing for small-scale Florete networks

Connections

The core requirement of Zero-Trust leads to connection-oriented design of the layer. This doesn't mean connections are limited to TCP-like stream service though — it is perfectly fine to have datagram service in the connection. Sending unsolicited datagrams is not possible by design though — and that is really good.

What kind of network services the connection between users of Portal API provides?

Basic "unicast" data transfer services include:

authentication, confidentiality, integrity (mTLS)
stream mode (single-stream — TCP-like; multi-stream — QUIC-like)
datagram mode (UDP-like, QUIC-like)
convergent mode (both stream and datagram — QUIC-like)
QoS, resource reservation
congestion control

Regarding congestion control and concurrency for bandwidth between connections, currently we go this route:

use datagram modes with little to no congestion control in "lower" layers
use a protocol with congestion control at the workload-facing layers — e.g. QUIC

Later on we'd need means to control bandwidth of connections — that can be done at the proxy level (shaping of any data transfer protocol). The need for better protocols in this regard is unclear — maybe shaping will be sufficient.

As a separate note: as we're going to use QUIC or other similar protocol with adaptive bandwidth, we're relying on the architecture where links provide delivery guarantees. Because QUIC treats packet loss as an indicator of congestion, and cannot distinguish it from loss in a link. So, link layer services over unreliable media (like radio channels) cannot be simple datagrams — they must implement an ARQ of some kind. That concern doesn't apply to links over UDP/IP though — we expect that IP networks already work over reliable links. For example, WiFi links implement that.

Future Work

This section covers topics that are not in the near scope of development, but very important in a mid- and long-term.

Service Discovery

The Discover method in the Portal API allows users to find services that have been published in the layer. This is the Florete abstraction for service discovery¹¹, it applies to any kind of service — workloads, system components, vertices from other layers, and even distributed services. This mechanism is naturally related to new vertices joining a layer as they're services in some other layers.

TODO(#41): Explore Service Discovery scenarios in Florete network

Multicast & Pub/Sub

We'd like to have "multicast" data transfer probably, but now leave this for later research. Also we'd like to have Pub/Sub service — but that's probably should be done on top of the primitive data transfer. Maybe even as a Distributed Service / app managed by the orchestrator. Also integration with existing protocols (MQTT, Zenoh¹² etc) should be taken in account. All this may be related to multicast.

TODO(#42): Research on multicast and pub/sub features in Florete network

Recursion Limits

There is a practical problem of the architecture: recursion ad infinitum is impossible. And actually it eats bandwidth very fast, especially if using protocols that are not designed for it, e.g. QUIC over QUIC. Our experiments have shown 3 layers of QUIC eat headers so much that 1280 minimum IP packet size for IPv6 cannot be met on 1500 MTU of Ethernet.

Need to invent a way to "flatten the recursion": in this case, we need virtual layers without actual encapsulation. And that's seem totally possible for in-cluster service mesh. Key design goal: keep the layer isolation. There is a known possibility for extremely lightweight layers — tunnels via MPLS label stacks. But they completely lack security.

TODO(#43): Invent virtual layers design for in-cluster network

Native Apps

Native Florete network apps are possible, but that requires more design efforts. Proxy design seems to be good — and followed by other overlay networks like Tor and I²P as well. But we may want an ability to add layers on top of basic proxies using external workloads.

TODO(#44): Consider design of native Florete network applications

Gateway

Normally, Portal API is provided locally, to workloads and vertices running on the same node. But it is possible to provide this API remotely, publishing it as a service. Note how it differs from vertex being a service in underlying layers — there it normally uses connections to transmit data (wrapped together with some routing hints). This "vertex service" is intended for other vertices of this layer only.

Here we talk about publishing the Portal API of a vertex from layer Y in some other layer W. This creates a gateway into that layer for users of the layer W. In practice, it may be useful to publish a Portal API of interrete layer within the cluster layer (service mesh layer of the cluster). This is how Interrete Gateway can be implemented.

TODO(#45): Explore Florete gateway design

Distributed Gateway

Distributed gateway is a logical gateway that is implemented as a group of workloads (or even system components). It is a special case of a distributed service.

Distributed service has a single identity (and hence, a name) by which it can be connected to. Connection is handled by one of the backing workloads, selected by load-balancing mechanism of the layer.

Open questions:

It is fairly clear how distributed gateway can operate; gateway is backed by a vertex; how distributed vertex would look like, and is there any use for distributed vertices that are not gateways?
It is clear how to add interrete connectivity without a gateway: establishing an interrete layer over cluster layer and providing local proxy access; is it possible to do so in a distributed manner though?
Are these questions actually equal, talking about the same thing?

TODO(#46): Explore distributed gateway and vertex designs

Florenetes Design

Florenetes is a distributed operating system for clusters of heterogeneous nodes, built on a three-layer architecture.

Florenetes Architecture — Three-layer architecture of the Florenetes cluster

Execution Plane (Green)

This is the layer where distributed applications run — both system and user ones. It includes:

Node: Physical device like laptop, phone, and robot, as well as virtual machine in the cloud
Workload: The smallest deployable unit of application on a node
Service Workload (shortened to Service when unambigous): A workload that is published in the cluster layer (cluster's service mesh)
Agent: A Florenetes agent running at each node, managing workloads on it
WRI (Workload Runtime Interface): A standard way to launch workloads on nodes
Distributed Application: A group of workloads that is managed by the orchestrator as a single entity
Distributed Service: a distributed application that is published under single identity in the cluster layer (and load-balanced)

This architecture resembles Kubernetes, but abstracts away its containerization specifics. We shouldn't require actual container runtimes from the nodes. This is especially true for smartphones — running containerized apps there seems infeasible, at least until their underlying OSes start supporting it. But this shouldn't limit our ability to join mobile phones as nodes into the cluster.

Running containers on some embedded systems may be infeasible too (though we're not targeting anything smaller than devices with ARMv7 general-purpose CPUs running Linux — i.e. controller-only devices are not considered to be cluster nodes).

Another example when containers cannot be used directly — a supernode that consists of multiple computers (e.g. a few distinct computers on board of a large vehicle).

So we need some abstraction that can be implemented for phones, embedded systems and supernodes. We call it Workload Runtime Interface.

Open question: should the agent run as a workload? Certainly, it has a system role of running other workloads — so this role cannot be a workload itself. But maybe we should split it in two: this low-level runner and higher-level agent that controls all the other aspects of the node. The latter can be run as a workload.

TODO(#47): Investigate whether Florenetes agent can be run as a workload

Data Plane (Purple)

This is the layer of Florete network data plane. It includes:

Proxy: A Florete network proxy running at each node and providing network connectivity to workloads
Cluster layer (service mesh): A layer of Florete network that is exposed by the proxy and used by workloads in the cluster

Open questions:

Should the proxy run as a workload? It is essential service which provides connectivity to all other workloads, so even if it is a workload, it is certainly special one.
At the beginning, the proxy will provide service to IP-based workloads, and it can be configured into a gateway for IP networks; will it provide native Florete proxy (and gateway) later on?

TODO(#48): Investigate whether Florete proxy should be run as a workload
TODO(#45): Explore native Florete proxy and gateway design

Control Plane (Pink)

This is the brain of the system, coordinating the work of all other components. The main difference from Kubernetes: all control plane components are distributed across all cluster nodes, running as workloads within the system itself. There are no dedicated master nodes — the system continues to function even when any group of nodes is disconnected.

It consists of two parts: control plane of the orchestrator and control plane of the network.

Orchestrator Control Plane

Here we follow Kubernetes design for now (would need to re-visit once we're close to implementing).

Core distributed services:

Scheduler Service: Decides where to run applications
API Service: Provides external and internal cluster APIs
Controller Manager Service: Ensures desired state of the cluster (via a few specialized controllers)
Storage Service: Stores state of the apps, including the core components

Network Control Plane

Florete network's control plane is built with SDN¹³ and TE¹⁰ principles evolved for agile environment of the heterogeneous cluster with mobile nodes. SDN defines fully centralized architecture of "smart" Controller and "stupid" devices under control. At the other end of the specter is fully decentralized architecture of independent nodes who optimize for themselves (having only local view of the network).

We generalize this in such a way that both centralized and decentralized cases are degenerate in our architecture. The middle case is following: agents at the nodes are not fully independent nor controlled — they have a degree of freedom. They coordinate instead of being controlled, acting within the decision space that they create together, by an entity called Coordinator.

Core distributed services:

Coordinator Service: A generalization of SDN Controller; administers the network, producing the decision space for the agents
Network Computation Service (NCS): Calculates network topology and routes. Extends ideas of traditional Path Computation Element (PCE)¹⁴ with topology tasks
Identity Service: Manages identity and authentication for all entities. Extends ideas of traditional SPIFFE Server⁷

TODO(#49): Explore Florete network control plane

Coordinator requires some form of consensus, probably with leader election. Certainly this will work only in somewhat stable conditions. When network churn rate is higher than characteristic time of reaching the consensus, it becomes infeasible. This is when the control plane should fall back to fully decentralized mode (using gossip protocols probably). But that isn't our main operating mode for sure: we expect that majority of use cases are within network conditions stable enough to maintain the Coordinator.

IP Network Services

There are special network services for apps running in the cluster:

Internet Egress Gateway: Provides outgoing Internet connections
Internet Ingress Gateway: Provides incoming Internet connections

Internet Egress Gateway may be used by e.g. a web browser to connect to sites using the best available exit route. Internet Ingress Gateway service may be used by e.g. a web server to serve external clients from the Internet.

Both gateways use load-balancing feature of the cluster. They are analogous to Ingress and Egress Gateways of Kubernetes. But they're essentially distributed services.

TODO(#50): Explore Internet Gateway design for Florenetes cluster

Supernode

This is a concept that needs further exploration. On practice there is a need for cluster nodes that consist of multiple computing nodes and local networks between them (e.g. in-vehicle Ethernet). Outer cluster network may even use inner cluster network — if the computing nodes are border routers (and this is practical — they may be nodes with radio links, for example). There is hope that Florete recursive principle can help solving this puzzle. This topic is related to interrete, and distributed vertices and gateways.

TODO(#51): Explore supernodes and recursive cluster intertwinted networks

Interrete

Interrete is inter-cluster networking (lowercase when referencing just some inter-cluster layer; capitalized when referencing a global network of clusters). It is built using the same recursive layering principle as the Florete network of the cluster (service mesh).

Interrete connectivity can be provided similarly to IP network services, just for native Florete network. Probably there will be native Florete Ingress and Egress Gateways — distributed services that implement interrete connectivity via remote Portal APIs. Alternatively, it is possible to add interrete connectivity via local proxies over cluster layer. The implications of these different approaches need to be investigated.

Interrete is a layer of Florete network that isn't centrally managed. There are a few interconnected clusters that manage their own segments of the layer. And there are borders where they're actually connected.

Design of the global Interrete isn't well-thought yet, but it will probably resemble Autonomous Systems of the Internet with important differences:

There will be much more user clusters than ASes now, as that's not prerogative of ISPs and large corporations
Operating a cluster connected to Interrete should be fully automated and not require technical skills of the owner
Dynamic routing between clusters must support high mobility of nodes — it would be normal for connection points to change

Interrete layer will require robust route aggregation method to scale to Internet-level. That's where addressing may be needed (either some new one, John Day suggested some "topological" addressing; or it will be IP-like).

We probably would need to design low-scale interrete for a few small clusters — and then scale it up to global Interrete.

TODO(#52): Explore interrete (inter-cluster networking)

RINA, the only post-IP architecture that is active, but so far academic-only. ↩
Service mesh, a service-to-service connection fabric used in modern clusters. ↩
Istio is a prominent service mesh implementation for Kubernetes. ↩
Avery Pennarun, The world in which IPv6 was a good design, 2017 ↩
We explicitly avoid term "level" as it implies stack-like construct; layers are not necessarily form stacks! Also note how we distinguish layers: each link is a layer itself; there is no "common link layer" like in Internet model. ↩
Isio's Ambient Mesh ↩
SPIFFE, identity control plane for distributed systems. ↩ ↩²
See Zooko's Triangle on the matter of names. ↩
MPLS, and Segment Routing as an evolution. ↩
Traffic Engineering for Internet. ↩ ↩²
Service discovery, a well-known networking concept. ↩
Zenoh, a distributed service with pubsub interface for key-value pairs. ↩
SDN, Software-defined networking. ↩
PCE, a central element of modern TE systems. ↩

High-Level Design

On this page