Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

Expectations in pss "prox send" #1528

@nolash

Description

@nolash

Problem

The objective of this discussion is to determine the guarantees of a mechanism in pss that lets the sender send a fully addressed message to an address that does not exist on the network.

Specifically, the questions are:

  • Should there be a guarantee that it will reach the closest node on the network?
  • What guarantees exist for which other nodes that will process the message?
  • How can results be verified by testing?

Current implementation

Forwarding

In pss Proximity Order is used by default to determine where to forward a message.

Consider a message which is fully addressed, meaning all 32 bytes of its address is disclosed.

If the address of the message falls within the Most Proximate Bin of the node, the message should be forwarded to all of the Nearest Neighbors.

Otherwise, it is forwarded to one peer in the corresponding bin of the message address. The kademlia bins are used to select the peer, and within these bins ordering of peers is undefined. That means that within the bin, one peer is chosen at random.

(see pss spec for more details)

Privacy concerns

In PSS it is possible to send partially addressed messages. In this case, only the specified portion of the address will be compared. A partial address always includes its Most Significant Bytes. This is guaranteed to broadcast to all nodes in the network matching the partial address.

When sending with a partial address, anyone watching the network cannot uniquely identify the intended recipient. (Even if there is only one node on the network that matches the partial address, it might still be the case that the intended recipient isn't connected at the time).

A typical use case for this strategy would be that the actual message payload can only be decrypted by one of the nodes matching the partial address. Any network watcher doesn't know of this failure to decrypt.

The PSS network should leak as little metadata as possible, based on the chosen mode of sending. For this reason, the PSS convention states that a message must always be forwarded, unless the recipient can be uniquely identified by the outside.

Proximity-based recipients in PSS

A PSS node decides itself whether it's the recipient of a message. One of the factors it evaluates is how the address of the message matches its own address. In default operation, the addresses must fully match.

Now, let's say we would like to send an arbitrary message to whichever node is CLOSEST to it on the network. This message will have an address that doesn't fully match the address of any node on the network. Therefore we need a different metric to decide. In particular, we need the receiving node to judge whether it is the closest one to it.

Informally (and with a certain lack of eloquence) we currently refer to this method as "prox send."

Partial addressing and "prox send" differ

Note that there is a significant difference between "prox send" and partial addressing.

We may say partial addressing refers to a radius from a global point of view.

"prox send" refers to a radius from a point of view local to each node.

Sending to neighborhood

There is a wish to use "prox send" to send to a group of closest nodes. In particular, this applies to using "prox send" to send content-addressed chunks as pss messages.

The intuitive understanding of closest in this context seems to be whichever nodes have the next-to-closest distances to the chunk. As with the closest node, each of these respecitve distances will be unique; there is only one second-to-closest node, only one third-to-closest, and so on.

Roles

What does the sender know?

Nothing, except for the guarantee that the closest node will be found.

What does the forwarder know?

In the current implementation, it is the recipient who decides whether the "prox send" method should be used to determine if it is a recipient. There is nothing in the message itself that indicates it. The reason for this is partially out of privacy concerns, but also comes from the desire to limit a sender's influence which computations the recipient should perform.

This means that any nodes that merely relay the message cannot know whether the message is a "prox send" message or not. For purposes of forwarding they need to treat it as any other message.

What does the next-to-closest nodes know?

For a node to decide whether is it the nth-to-closest to the message, it would have to know whether or not there exists closer nodes on the network than the peers it has. Only the negative can be confirmed here, in case it has equal or more than n peers that are closer to the message. Any other estimation is a false positive.

The best a next-to-closest node can do, is to independently decide on some metric of close-enough-ness.

The current implemented proposal is that the node compares the message to its own Saturation Depth instead, and if the Proximity Order of the message is in the Saturation Depth or below, it will consider itself "close enough" to process the message.

It is important to note that this local decision is based on not only the collection of nodes in the network, but how they happen to be connected at the moment of routing.

What does the closest node know?

How can the recipient determine whether it's the closest one? One way is to examine all peers currently connected to it, compute the distance metric on the message for them, and compare to the distance metric of the node itself. If none are closer, the node must be the closest one.

This requires some work. Albeit not too much, work, since it is only necessary to do this work on the peers with the highest number of matching Most Significant Bytes.

It logically follows that if a closer peer is found, the message should be forwarded to that (and only that) peer.

In the current implementation the distance is not explicitly examined. Instead the same strategy with Most Proximate Bin described for "next-to-closest" nodes above is used.

What does nobody know?

In the current implementation of pss, the code handling an incoming message cannot identify the devp2p peer the message is relayed from. This is the same code that is in charge of forwarding the message if this is warranted.

Case: Local neighborhood vs Distance

We now have established the following:

  • There exists one single node on the network that is the closest node to the message.
  • Forwarding in pss will always reach the closest node.
  • When forwarding to a bin, a peer is chosen at random within that bin.
  • When forwarding a message, a node does not know from which peer the message was delivered by.
  • Forwarding peers do not know if a message will be evaluating using the "prox send" method.
  • Any next-to-closest peers cannot know which order of closeness to the message it has.

Now let's consider this example:

Message address: 1011 1111
Closest node: 1011 1000

Sender address: 1100 xxxx
0 0xxx xxxx
1 1011 0001 1011 1000
- ---
2 111x xxxx
3 1101 xxxx

Here, message falls in bin 1. Peer 1011 0001 is chosen at random as intermediate peer to forward to.

Intermediate node: 1011 0001
0 0xxx xxxx
1 11xx xxxx
- ---
2 1010 0000
3 1011 1000

Here, message falls within Most Proximate Bin. In the current implementation, if the node considers this type of message a "prox send" message, the node considers itself a recipient of the message.

It also forwards the message to 1010 0000 and 1011 1000.

Closest node: 1011 1000
0 0xxx xxxx
1 11xx xxxx
2 100x xxxx
- ---
3 1010 1000
4 1011 0001

Here, by comparing the message with all peers the node can clearly see that its distance is closer than anyone else.

However, this is not done in the current implementation. As before, since the message is in the Most Proximate Bin, it will be considered a recipient.

Redundant node: 1010 1100
0 0xxx xxxx
1 11xx xxxx
2 100x xxxx
3 1011 0001
- ---
4 1010 0000
5 1010 1010

This node gets the message from the closest node. But even if this node was in the Most Proximate Bin of the closest node does not find the message in its own Most Proximate Bin . It is, however, the third closest node to the message on the network, in terms of Distance.

Sorting all the addresses we've used above:

recipient description address po distance
- message 1011 1111 256 0
x closest 1011 1000 5 7
x intermediate 1011 0001 4 14
redundant 1010 1100 3 20
--- --- --- --- ---
x misc 1010 1010 3 21
1010 0000 3 31
100x xxxx 2 (far)
11xx xxxx 1 (far)
0xxx xxxx 0 (far)

Testing

In order to test whether prox routing works, we need to know where a chunk should end up.

Theoretically, Swarm is based on the premise that chunk belongs with the node that is closest to it. If the distance metric is used, then there can be only one node on the network that is the closest one to the chunk.

It seems we can rely on that a message will end up at its closest node, even using the Most Proximate Bin as local delivery decision.

But it seems we cannot rely on that a message will end up at the n next-to-closest nodes using the same method.

The only way to test whether the expectations for the n next-to-closest nodes are correct, seems to be to acutally calculate local routing decisions at the time of send. This seems infeasible on a live testing cluster, and involves the same level of complexity as the functionality we are testing.

Recommendation

  1. The only guarantee pss "prox send" should give is that it will reach the closest peer in the network in terms of distance.

  2. Any effects of using Most Proximate Bin as local decision, apart from reaching the closest peer, should be considered undefined.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions