Expectations in pss "prox send"

# Problem

The objective of this discussion is to determine the guarantees of a mechanism in `pss` that lets the sender send a fully addressed message to an address that does not exist on the network.

Specifically, the questions are:

* Should there be a guarantee that it will reach the _closest_ node on the network?
* What guarantees exist for which other nodes that will process the message?
* How can results be verified by testing?

# Current implementation

## Forwarding 

In `pss` `Proximity Order` is used by default to determine where to forward a message.

Consider a message which is _fully addressed_, meaning all 32 bytes of its address is disclosed.

If the address of the message falls within the _Most Proximate Bin_ of the node, the message should be forwarded to all of the _Nearest Neighbors_.

Otherwise, it is forwarded to _one_ peer in the corresponding bin of the message address. The kademlia bins are used to select the peer, and within these bins ordering of peers is undefined. That means that within the bin, _one_ peer is chosen at random.

(see [pss spec](https://swarm-gateways.net/bzz:/dcad67ce1b19aa80d8bb2c1e5c2244c2e67f3f0f894cf863ed5522d0943e4c14/) for more details)

## Privacy concerns

In PSS it is possible to send partially addressed messages. In this case, only the specified portion of the address will be compared. A partial address always includes its Most Significant Bytes. This is guaranteed to broadcast to _all_ nodes in the network matching the partial address.

When sending with a partial address, anyone watching the network cannot uniquely identify the intended recipient. (Even if there is only one node on the network that matches the partial address, it might still be the case that the intended recipient isn't connected at the time).

A typical use case for this strategy would be that the actual message payload can only be decrypted by one of the nodes matching the partial address. Any network watcher doesn't know of this failure to decrypt.

The PSS network should leak as little metadata as possible, based on the chosen mode of sending. For this reason, the PSS convention states that a message must _always_ be forwarded, _unless_ the recipient can be uniquely identified by the outside.


## Proximity-based recipients in PSS

A PSS node decides itself whether it's the recipient of a message. One of the factors it evaluates is how the address of the message matches its own address. In default operation, the addresses must fully match. 

Now, let's say we would like to send an arbitrary message to whichever node is CLOSEST to it on the network. This message will have an address that doesn't fully match the address of _any_ node on the network. Therefore we need a different metric to decide. In particular, we need the receiving node to judge whether it is _the closest one_ to it.

Informally (and with a certain lack of eloquence) we currently refer to this method as "prox send." 

## Partial addressing and "prox send" differ

Note that there is a significant difference between "prox send" and partial addressing.

We may say partial addressing refers to a radius from a global point of view.

"prox send" refers to a radius from a point of view _local to each node_.

## Sending to neighborhood

There is a wish to use "prox send" to send to a _group_ of _closest_ nodes. In particular, this applies to using "prox send" to send content-addressed chunks as pss messages.

The intuitive understanding of _closest_ in this context seems to be whichever nodes have the next-to-closest distances to the chunk. As with the _closest_ node, each of these respecitve distances will be unique; there is only _one_ second-to-closest node, only _one_ third-to-closest, and so on.


# Roles

## What does the sender know?

Nothing, except for the guarantee that the closest node will be found.

## What does the forwarder know?

In the current implementation, it is the recipient who decides whether the "prox send" method should be used to determine if it is a recipient. There is nothing in the message itself that indicates it. The reason for this is partially out of privacy concerns, but _also_ comes from the desire to limit a sender's influence which computations the recipient should perform.

This means that any nodes that merely relay the message _cannot_ know whether the message is a "prox send" message or not. For purposes of forwarding they need to treat it as any other message.

## What does the next-to-closest nodes know?

For a node to decide whether is it the `nth`-to-closest to the message, it would have to know whether or not there exists _closer_ nodes on the network than the peers it has. Only the negative can be confirmed here, in case it has _equal or more than_ `n` peers that are closer to the message. Any other estimation is a false positive.

The best a next-to-closest node can do, is to independently decide on some metric of close-enough-ness.

The current implemented proposal is that the node compares the message to its own _Saturation Depth_ instead, and if the _Proximity Order_ of the message is in the _Saturation Depth_ or below, it will consider itself "close enough" to process the message.

It is important to note that this local decision is based on not only the collection of nodes in the network, but how they happen to be connected at the moment of routing.

## What does the closest node know?

How can the recipient determine whether it's the closest one? One way is to examine all peers currently connected to it, compute the distance metric on the message for them, and compare to the distance metric of the node itself. If none are closer, the node must be the closest one.

This requires some work. Albeit not too much, work, since it is only necessary to do this work on the peers with the highest number of matching Most Significant Bytes.

It logically follows that if a closer peer is found, the message should be forwarded to that (and only that) peer.

In the current implementation the distance is _not_ explicitly examined. Instead the same strategy with _Most Proximate Bin_ described for "next-to-closest" nodes above is used.

## What does nobody know?

In the current implementation of `pss`, the code handling an incoming message cannot identify the `devp2p` peer the message is relayed from. This is the same code that is in charge of _forwarding_ the message if this is warranted.


## Case: Local neighborhood vs Distance

We now have established the following:

* There exists _one single node_ on the network that is _the closest node_ to the message.
* Forwarding in `pss` will _always_ reach _the closest node_.
* When forwarding to a bin, a peer is chosen at random within that bin.
* When forwarding a message, a node does not know from which peer the message was delivered by.
* Forwarding peers do not know if a message will be evaluating using the "prox send" method.
* Any next-to-closest peers cannot know which order of _closeness_ to the message it has.

Now let's consider this example:


Message address: `1011 1111`
Closest node: `1011 1000`

| | Sender address: `1100 xxxx` |
|-|---|
|0| `0xxx xxxx` |
|1| `1011 0001` `1011 1000` |
|-|---|
|2| `111x xxxx` | 
|3| `1101 xxxx` |

Here, message falls in bin `1`. Peer `1011 0001` is chosen at random as intermediate peer to forward to.

| | Intermediate node: `1011 0001` |
|-|---|
|0| `0xxx xxxx` |
|1| `11xx xxxx` |
|-|---|
|2| `1010 0000` |
|3| `1011 1000` |

 
Here, message falls within _Most Proximate Bin_. In the current implementation, if the node considers this type of message a "prox send" message, the node considers itself a recipient of the message.

It also forwards the message to `1010 0000` and `1011 1000`.

| | Closest node: `1011 1000` |
|-|---|
|0| `0xxx xxxx` |
|1| `11xx xxxx` |
|2| `100x xxxx` |
|-|---|
|3| `1010 1000` |
|4| `1011 0001` |


Here, by comparing the message with all peers the node can clearly see that its distance is closer than anyone else.

However, this is not done in the current implementation. As before, since the message is in the `Most Proximate Bin`, it will be considered a recipient.

| | Redundant node: `1010 1100` |
|-|---|
|0| `0xxx xxxx` |
|1| `11xx xxxx` |
|2| `100x xxxx` |
|3| `1011 0001` |
|-|---|
|4| `1010 0000` |
|5| `1010 1010` |

This node gets the message from the _closest node_. But even if this node was in the _Most Proximate Bin_ of the _closest_ node does _not_ find the message in _its own Most Proximate Bin_ . It is, however, the _third closest_ node to the message on the network, in terms of _Distance_.

Sorting all the addresses we've used above:

| recipient | description | address   | po  | distance |
|---|---|---|---|---|
|-|message     |`1011 1111`|`256`|  0 |  
|x|closest     |`1011 1000`|  `5`|  7 |
|x|intermediate|`1011 0001`|  `4`| 14 |
| |redundant   |`1010 1100`|  `3`| 20 |
|---|---|---|---|---|
|x| misc        |`1010 1010`|  `3`| 21 |
| |           |`1010 0000`|  `3`| 31 |
| |           |`100x xxxx`|  `2`| (far) |
| |           |`11xx xxxx`|  `1`| (far) |
| |          |`0xxx xxxx`|  `0`| (far) |

# Testing

In order to test whether prox routing works, we need to know where a chunk should end up.

Theoretically, Swarm is based on the premise that chunk belongs with the node that is closest to it. If the distance metric is used, then there can be only one node on the network that is the closest one to the chunk.

It seems we can rely on that a message will end up at its closest node, even using the `Most Proximate Bin` as local delivery decision.

But it seems we _cannot_ rely on that a message will end up at the `n` next-to-closest nodes using the same method.

The only way to test whether the expectations for the `n` next-to-closest nodes are correct, seems to be to acutally calculate local routing decisions at the time of send. This seems infeasible on a live testing cluster, and involves the same level of complexity as the functionality we are testing.

# Recommendation

1. The only guarantee `pss` "prox send" should give is that it will reach the _closest_ peer in the network in terms of distance.

1. Any effects of using `Most Proximate Bin` as local decision, _apart_ from reaching the _closest_ peer, should be considered undefined.


	Sender address: `1100 xxxx`
0	`0xxx xxxx`
1	`1011 0001` `1011 1000`
-	---
2	`111x xxxx`
3	`1101 xxxx`

	Intermediate node: `1011 0001`
0	`0xxx xxxx`
1	`11xx xxxx`
-	---
2	`1010 0000`
3	`1011 1000`

	Closest node: `1011 1000`
0	`0xxx xxxx`
1	`11xx xxxx`
2	`100x xxxx`
-	---
3	`1010 1000`
4	`1011 0001`

	Redundant node: `1010 1100`
0	`0xxx xxxx`
1	`11xx xxxx`
2	`100x xxxx`
3	`1011 0001`
-	---
4	`1010 0000`
5	`1010 1010`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Expectations in pss "prox send" #1528

Problem

Current implementation

Forwarding

Privacy concerns

Proximity-based recipients in PSS

Partial addressing and "prox send" differ

Sending to neighborhood

Roles

What does the sender know?

What does the forwarder know?

What does the next-to-closest nodes know?

What does the closest node know?

What does nobody know?

Case: Local neighborhood vs Distance

Testing

Recommendation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

recipient	description	address	po	distance
-	message	`1011 1111`	`256`	0
x	closest	`1011 1000`	`5`	7
x	intermediate	`1011 0001`	`4`	14
	redundant	`1010 1100`	`3`	20
---	---	---	---	---
x	misc	`1010 1010`	`3`	21
		`1010 0000`	`3`	31
		`100x xxxx`	`2`	(far)
		`11xx xxxx`	`1`	(far)
		`0xxx xxxx`	`0`	(far)

Uh oh!

Expectations in pss "prox send" #1528

Description

Problem

Current implementation

Forwarding

Privacy concerns

Proximity-based recipients in PSS

Partial addressing and "prox send" differ

Sending to neighborhood

Roles

What does the sender know?

What does the forwarder know?

What does the next-to-closest nodes know?

What does the closest node know?

What does nobody know?

Case: Local neighborhood vs Distance

Testing

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions