[WIP] Experimental ObjectDiffusion v2 (just inbound side changes) for efficient vote retrieval #1698

tbagrel1 · 2025-09-29T13:48:47Z

Co-authored-by: nbacquey [email protected]

Description

Please include a meaningful description of the PR and link the relevant issues
this PR might resolve.

Also note that:

New code should be properly tested (even if it does not add new features).
The fix for a regression should include a test that reproduces said regression.

Co-authored-by: nbacquey <[email protected]>

…cities`

Co-authored-by: nbacquey <[email protected]>

tbagrel1 · 2025-10-20T12:22:00Z

Using IntersectMBO/ouroboros-network@0db8669

Co-authored-by: nbacquey <[email protected]>

agustinmista · 2025-10-22T18:20:56Z

...ensus/src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/ObjectDiffusion/Inbound/V2.md

+- the inbound peer that submitted the object to pool might have acked it already at the moment the object is rejected by the pool, but the rejection indicates that the outbound peer which sent us the object is adversarial, and we should disconnect from it anyway. So there is no harm done by having acked the object to the adversarial outbound peer, as we won't want to re-download this object from it again (or any other object whatsoever).
+- any other inbound peer that has this ID available from its outbound peer won't be able to ack it because this ID isn't in **their** `dpsObjectsOwtPool`, and is not in the pool either, so we will be able to download it from these other peers until we find a valid one.
+
+As in TxSubmission, acknowledgement is done by indicating to the outbound peer the length of the (longest) prefix of the oustanding FIFO that we no longer care about (i.e. for which all IDs are eligible to acknowledgment by the definition above). The field `dpsOutstandingFifo` on the inbound peer is supposed to mirror exactly the state of the FIFO of the outbound peer, bar eventual discrepancies due to in-flight information.


I probably missed something along the way, but what happens when a peer receives all but the first object it requests from an outbound peer? Can that object be re-requested from the same peer, or do we rely on some other peer sending it successfully and thus allowing us to move ahead in the outstanding FIFO?

The outbound peer is not allowed to reply partially to one of our requests. It could (maybe) reply out of order to different requests we made, but if request A asks for 30 specific objects, reply to request A should contain exactly the 30 objects, otherwise we disconnect immediately from that peer for protocol fault.

That makes sense. I had missed the part that the requested objects would have to be sent all in the same reply and not in individual ones.

I think it is a requirement of the typed-protocol stuff that a single request lead to a single reply. Because we count (as a type-level integer) how many requests have been pipelined to know how many replies we can expect.

agustinmista · 2025-10-22T18:22:54Z

...ensus/src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/ObjectDiffusion/Inbound/V2.md

+When making decisions, we first divide the peers in two groups:
+
+- Those who are currently executing a decision, i.e., those for which the (previous) decision in the decision channel verifies `pdStatus == DecisionBeingActedUpon`. These are further called _frozen peers_.
+- Those who are not currently executing a decision, i.e., those for which the (previous) decision in the decision channel verifies `pdStatus == DecisionUnread || pdStatus == DecisionCompleted`. The former are the ones who didn't have time to read the previous decision yet, so it makes sense to recompute a more up-to-date decision for them. The latter are the ones who have completed executing the previous decision, so it also makes sense to compute a new decision for them. These two categories of peers are further called _active peers_.


The former are the ones who didn't have time to read the previous decision yet, so it makes sense to recompute a more up-to-date decision for them.

Could this somehow lead to starvation? I.e. the peer never leaving the DecisionUnread state because the decision thread keeps recomputing a new one.

Usually, as soon a peer has completed a decision, it will start again in go and ask for a new one. As soon as psaReadDecision returns, the decision is marked as DecisionBeingActedUpon. And because the decision thread is rate-limited, there is always a window in which the "lock" (not sure how exactly the STM operates to ensure atomic mutations of shared mutable variables) on the decision is not being held by the decision thread, allowing for the peer to read it and mark it as being acted upon.

The only case where a peer would not ask for a decision immediately after completing one is when it has made a blocking request for IDs, in which case it will block and collect the reply before starting at the top of go again. This is the only case I see where the decision logic could update the decision (with status DecisionUnread) a significant amount of times before the peer asks for it, but in that case the decision should be trivial (i.e. do nothing because there is no IDs in the FIFO, as this is the precondition for making a blocking request for IDs).

I'm not sure I answered your question actually. Feel free to ask for more details.
Also if you see how the documentation could be improved to make it clearer for future readers, please suggest an edit ;)

Thanks. Yes, I think this answers my question.

Maybe a follow up question: let's say that all our inbound peers are blocked waiting for IDs. Would this make the decision thread enter a temporary busy waiting loop where it does nothing but to update its decisions over and over, or would it block as well until some condition flag switches state?

Sorry in advance if this is nonsensical :)

Indeed, at the beginning of the protocol, all inbound peers would block waiting for IDs, so the decision loop would always update trivial decisions by a new trivial decision.

It could make sense to pre-filter-out the peers that are waiting for IDs, so that they are not considered by the decision thread. Even simpler actually would be to only call psaOnDecisionExecuted when the IDs have been received in the blocking case, so they would be filtered out automatically (considered as "frozen" peers). If we do this change, there will be theoretically no case where a decision would be updated a significant amount of time before being read and frozen in.

NOTE: a peer can only block requesting for IDs when there are no longer any IDs in its FIFO, so the decision for this peer must be really trivial, as there is nothing we can ask it to do.

agustinmista · 2025-10-22T18:29:19Z

...ensus/src/ouroboros-consensus/Ouroboros/Consensus/MiniProtocol/ObjectDiffusion/Inbound/V2.md

+- They are not already in the pool
+- They are available from the peer, and won't be acked by the peer on its next request for IDs according to the partial decision computed by `computeAck` at the previous step (NOTE: theoretically, this second condition is redundant with other constraints and invariants of the current implementation).
+
+Then we "reverse" this mapping to obtain a map of object IDs to the set of active peers that have the corresponding interesting objects available (according to the criteria above), further called _potential providers_.


Is there any notion of quality of service that one could use to prioritize certain potential providers over others at this point?

We could try to balance the load, e.g. prioritize peers with the fewer objects in flight at a given time.

Given that any invalid object triggers protocol termination for a peer, we can't have a metric of "quality" for the objects provided by an outbound peer. As it must be perfect or we disconnect.

It could make sense to use information from the network layer maybe, to see how fast the peers are to distribute objects and prioritize the fastest providers. But that would probably require some architectural changes; or require that we measure the time taken between a request and reply ourselves, at the application layer, which might be inconvenient/inaccurate. Really don't know at this point.

Probably this should be part of the optimization effort for voting logic, which is not due for several months.

tbagrel1 changed the title ~~Peras/experimental-object-diffusion-inbound-v2~~ [WIP] Experimental ObjectDiffusion v2 (just inbound side changes) for efficient vote retrieval Sep 29, 2025

nbacquey mentioned this pull request Oct 1, 2025

Figure out non-trivial inbound logic for votes retrieval tweag/cardano-peras#140

Open

tbagrel1 linked an issue Oct 2, 2025 that may be closed by this pull request

Figure out non-trivial inbound logic for votes retrieval tweag/cardano-peras#140

Open

tbagrel1 force-pushed the peras/experimental-object-diffusion-inbound-v2 branch 2 times, most recently from 9de5d64 to 443cfc8 Compare October 13, 2025 14:03

tbagrel1 and others added 25 commits October 15, 2025 12:53

Initial commit with copy-pasted TxSubmissionV2 for ObjectDiffusionV2

fb5f0b4

Co-authored-by: nbacquey <[email protected]>

Some refactor

e8458cb

Co-authored-by: nbacquey <[email protected]>

WIP: further refactor

b225208

wipper

511ed4f

Finish most refactor on Types.hs

6ee28bf

Fix some more errors

94dc009

Continue re-organizing decision impl

2a899e9

WIP before removing dgsObjectsPending

fed9f0d

WIP: cleaning State.hs file

dac65af

more work on State.hs

8775a5d

clean handleReceivedObjectsImpl

3cd2508

First diagram attempt

128b6a1

Continue on handleReceivedObjectsImpl

36ab9e9

Stabilize State.hs, plumbing in Registry.hs

ffe6cf4

Further update state.hs

cd4c6a6

Improve state management and diagram

70a793f

Update state and diagram following removal of `dgsObjectsLiveMultipli…

553c767

…cities`

Update state management (again)

56ee864

Remove objectsPending field

ffb9f39

Co-authored-by: nbacquey <[email protected]>

Simplify pickObjectsToReq logic

cc30a91

clean up code in decision process

2377db9

Finalize new decision logic

e1a5987

formatting

70feb95

Further polishing

b7ef54c

Remove useless function

629b1b2

nbacquey and others added 5 commits October 15, 2025 12:53

Remove StdGen from global state

3ae6dc1

Make decision only for peers that haven't read their decision yet

513291c

WIP V2.hs

8c4094a

Co-authored-by: nbacquey <[email protected]>

First version that builds! Youpiii!

2b3cdd5

Fix formatting

8bd6007

tbagrel1 force-pushed the peras/experimental-object-diffusion-inbound-v2 branch from 96b7016 to 8bd6007 Compare October 15, 2025 12:00

tbagrel1 changed the base branch from main-pr/object-diffusion to peras-staging October 15, 2025 12:03

tbagrel1 and others added 4 commits October 20, 2025 14:23

Update s-r-p to point on latest ouroboros-network/peras-staging

d6aaf0c

Move V2.mermaid to V2.md

aa34feb

Co-authored-by: nbacquey <[email protected]>

WIP: documentation effort on registry and state

4e145a3

Add failures for protocol errors and implementation errors

f794d6f

nbacquey force-pushed the peras/experimental-object-diffusion-inbound-v2 branch from d28fe49 to f794d6f Compare October 20, 2025 18:08

nbacquey and others added 7 commits October 21, 2025 17:13

Check that received IDs are not already in pool

b677b71

Formatting and updated comments

da01ff2

Futher Documentation changes

b86f0b2

Co-authored-by: nbacquey <[email protected]>

"Fix" mermaid diagram

919796a

More documentation and fixes

87e8ee6

Further edits to doc

fab43c7

Protocol implem design document is ready!

ff925ad

agustinmista reviewed Oct 22, 2025

View reviewed changes

WIP: benchmarks for object diffusion logic

7fa719c

nbacquey force-pushed the peras/experimental-object-diffusion-inbound-v2 branch from a459164 to 7fa719c Compare October 23, 2025 21:35

[WIP] Experimental ObjectDiffusion v2 (just inbound side changes) for efficient vote retrieval #1698

Are you sure you want to change the base?

[WIP] Experimental ObjectDiffusion v2 (just inbound side changes) for efficient vote retrieval #1698

Uh oh!

Conversation

tbagrel1 commented Sep 29, 2025

Description

Uh oh!

tbagrel1 commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tbagrel1 Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbagrel1 Oct 23, 2025 •

edited

Loading