Data Plane
Design of the data plane for C1
Key Challenge
Key challenge of the foundational work lies in how to represent vertices and Portal APIs between them, and how to satisfy requirement to support different link protocols easily. In our previous attempt we implemented vertices as async actors within Tokio runtime, and Portal API was actor API, channel-based, with data path (connection) implemented as a trait object. This has following drawbacks:
- Cluster vertex with QUIC endpoint for transport service needs UDP socket to work on: that's natural for any implementation because QUIC runs over UDP sockets. Moreover, almost any transport endpoint will require some sort of a datagram socket - e.g. TCP over IP, or DTLS over UDP, or WireGuard over UDP.
- Implementing a socket-like structure in a userspace properly is hard: in terms of async Rust, you need a waker to notify when sending or receiving is ready. And, as we're using quinn-rs connections with dgram service for links, we ended up with
AsyncUdpSocketoversend_datagram/receive_datagrammethods of a link vertex's QUIC endpoint. The QUIC API lacks the needed waker, implementation works, but busy-waiting (poll_sendis always ready). And fixing this probably requires rewriting/extending top-level QUIC APIs of quinn exactly for this niche use case - a task not feasible for the time being. - What is even more important, this design of vertices as in-process actors isn't extendable really with other link protocols: it requires re-implementing them in async Rust! A task that isn't feasible either.
New Solution
Instead of trying to implement the socket-like structure in the userspace, we can use actual kernel-provided socket! This brilliantly solves both problems:
- we get real socket structure for the cluster vertex to use
- we can separate link implementations to external processes, enabling modular design and support for virtually any link protocol
The main drawback of this new approach is performance: copying of the packet to and from userspace multiple times, context switches, calls into IP stack. Theoretically it can be solved using zero-copy buffers (AF_XDP?) and eBPF, but requires deep dive into these technologies. Non-UDP or even non-socket IPC can be adopted probably. Hopefully, we can leave this for later, performance isn't a goal for our C1 milestone.
Design Sketch
Problem Definition
Need to specify architecture of our solution that covers C1 scenarios and has following mandatory properties:
- Cluster vertex must use some datagram socket to communicate with link vertex
- The architecture must allow adding third party link implementations without re-implementing them in Rust
- Cross-platfrom design for Linux, Windows, MacOS, Android, iOS, and HarmonyOS
- Recursiveness: datagram socket used to communicate with link vertex should be reusable by cluster vertex users, including interrete vertex
Accepted Design
Terms:
- Northbound interface: an interface for clients ("on the top" of current component)
- Southbound interface: an interface for used components ("on the bottom" of current component)
- Inbound: an interface module to serve clients that want to originate connections/packets
- Outbound: an interface module to serve clients that want to receive connections/packets
Proposal:
Implement single executable called flor with following modules:
- IP Inbound: SOCKS5 for TCP
- IP Outbound: direct TCP (reverse proxy)
- Northbound native Flor IO (Inbound/Outbound): datagram socket and custom protocol (impl-specific, we do not want to stabilize it and provide as public API at the moment)
- TransportEndpoint: QUIC, provides connections for clients
- ForwardingEngine:
- (cluster & others) forwarding by MPLS-inspired labels
- (link) star topology forwarding (over UDP socket)
- ConnectionManager: extendable module that can provide different connection types:
- native: connections over Flor IO (southbound Flor IO socket)
- udp: peer-to-peer links over UDP socket
- 3rdparty: links of any kind (work over third-party specific IPC)
Deploy scenarios:
flor(native) ==> flor(udp)- our C1 scenarios will be covered by this oneflor(3rdparty) ==> 3rdparty-link- any third-party link support is possibleflor(native) ==> flor(native) ==> flor(udp)- 3-layer setup becomes possible for experiments
Reasoning Notes
This section stores raw reasoning flow (human-only, AI wasn't involved) that lead to the design decision. May be of interest for people who want to understand why it was done this way. To read, click on the spoiler below.