Skip to content

Conversation

@andrewjstone
Copy link
Contributor

@andrewjstone andrewjstone commented Oct 30, 2025

This builds on #9310

This code essentially re-implements the bootstore implementation of early network config replication over sprockets channels.

The persistence code of this state still lives in the bootstore, but will eventually move to the trust-quorum crate once all customer systems are running only trust quorum.

This code does not deal with how the switchover from bootstore to to trust-quorum is made. That will come later with trust-quorum / sled-agent integration.

@andrewjstone andrewjstone mentioned this pull request Oct 30, 2025
19 tasks
Builds on #9232

This is the first step in wrapping the `trust_quorum::Node` so that it
can be used in an async context and integrated with sled-agent. Only
the sprockets networking has been fully integrated so far such that
each `NodeTask` has a `ConnMgr` that sets up a full mesh of sprockets
connections. A test for this connectivity behavior has been written but
the code is not wired into the production code yet.

Messages can be sent between `NodeTasks` over sprockets connections.
Each connection exists in it's own task managed by an `EstablishedConn`.
The main `NodeTask` task sends messages to and receives messages from
this task to interact with the outside world via sprockets. Currently
only `Ping` messages are sent over the wire as a means to keep the
connections alive and detect disconnects.

A `NodeHandle` allows one to interact with the `NodeTask`. Currently
only three operations are implemented with messages defined in
`NodeApiRequest`. The user can instruct the node who it's peers
are on the bootstrap network to establish connectivity, can poll
for connectivity status, and can shutdown the node. All of this
functionality is used in the accompanying test.

It's important to re-iterate that this code only implements connectivity
between trust quorum nodes and no actual trust quorum messages are sent.
They can't be as a handle can not yet initiate a reconfiguration or
LRTQ upgrade. That behavior will come in a follow up. This PR is large
enough.

A lot of this code is similar to the LRTQ connection management code,
except that it operates over sprockets rather than TCP channels. This
introduces some complexity, but it is mostly abstracted away into the
`SprocketsConfig`.
`NodeTask` now uses the `trust_quorum_protocol::Node` and
`trust_quorum_protocol::NodeCtx` to send and receive trust quorum
messages. An API to drive this was added to the `NodeTaskHandle`.

The majority of code in this PR is tests using the API.

A follow up will deal with saving persistent state to a Ledger.
Builds on #9296

This commit persists state to a ledger, following the pattern used in
the bootstore. It's done this way because the `PersistentState` itself
is contained in the sans-io layer, but we must save it in the async task
layer. The sans-io layer shouldn't know how the state is persisted, just
that it  is, and so we recreate the ledger for every time we write it.

A follow up will PR will deal with the early networking information saved
by the bootstore, and will be very similar.
This code essentially re-implements the bootstore implementation of
early network config replication over sprockets channels.

The persistence of this state still lives in the bootstore, but will
eventually move to the `trust-quorum` crate once all customer systems
are running only trust quorum.

This code does not deal with how the switchover from bootstore to
to trust-quorum is made. That will come later with trust-quorum /
sled-agent integration.
Base automatically changed from tq-sprockets-3 to main November 4, 2025 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants