network/retrieve: add bzz-retrieve protocol #1589

acud · 2019-07-22T11:40:16Z

This PR adds a separate bzz-retrieve protocol to Swarm.

The necessity of the protocol was initially identified during the stream protocol rewrite and has finally crystallised into its own separate package when retrievals were needed in order to test the new stream.

Since stream notions towards chunks are in ranges of sequentially incremented indexes, there is in fact no place to treat and request chunks by their content-addressed hash. Furthermore, a node's participation in bzz-retrieve does not anymore coerce a node to participate in stream, which has it's own set of implications. This is imperative in order to facilitate finer granularity of feature-set support for adaptive nodes in the future.

This PR does not hardwire the protocol into swarm.go and all of the necessary locations. This is going to be done as part of the new stream protocol PR (as with the current codebase, as long as the old stream is used - there's no need to wire the protocol in since retrieve requests are handled as part of current stream). This is to clean the diff and allow easier review of new stream protocol.

network/retrieve/peer.go

network/retrieve/retrieve.go

network/retrieve/retrieve_test.go

network/retrieve/wire.go

nonsense · 2019-07-22T12:10:26Z

testutil/node.go

+)
+
+// NodeConfigAtPo brute forces a node config to create a node that has an overlay address at the provided po in relation to the given baseaddr
+func NodeConfigAtPo(t *testing.T, baseaddr []byte, po int) *adapters.NodeConfig {


I'd prefer if we don't add test code in the main binary... maybe rename this to util_test.go or something ending with _test.go ?

we can't. either it sits in this package or it gets added to the binary. this functionality is super important in our test suite and should be accessible from any package. i don't have any solution for this

I think that if this package is imported only in test files, it will not be included in the binary.

I still don't understand why we need a testutil package, and why can't that live next to the test code. It doesn't appear to be used by anything else, but the retrieval tests.

this is going to be used in the new stream package for certain tests. that's why i put it here so i dont need to rebase or clip this out of the directory just in a few days. that's all

holisticode

I left a couple of comments; consider my review as non-authoritative, I mainly wanted to update myself on what you were working on

network/retrieve/retrieve.go

holisticode · 2019-07-22T20:58:37Z

network/retrieve/retrieve.go

+// findPeer finds a peer we need to ask for a specific chunk from according to our kademlia
+func (r *Retrieval) findPeer(ctx context.Context, req *storage.Request) (retPeer *network.Peer, err error) {
+	log.Trace("retrieval.findPeer", "req.Addr", req.Addr)
+	osp, _ := ctx.Value("remote.fetch").(opentracing.Span)


personal opinion: even if the conversion is 100% fool proof (now), I don't like to ignore errors

i wouldn't like to have this function fail on a debugging feature nor would i like to see anything in the logs as of such. leaving it as is.

I guess what Fabio meant is to handle the error, not necessarily to fail the function on a debugging feature. We can log the error if it happens. Then again we know that this is opentracing.Span so I think it is not necessary.

i understand. and i dont want to log anything because is a very non-critical error and i have a feeling that if we log it - we'll see a lot of log spamming in production

network/retrieve/retrieve.go

holisticode · 2019-07-22T21:02:55Z

network/retrieve/retrieve.go

+		}
+
+		// skip peers that we have already tried
+		if req.SkipPeer(id.String()) {


bytes signature?

i rather not touch anything in network/stream right now (this change vets it)

network/retrieve/retrieve.go

holisticode · 2019-07-22T21:25:18Z

network/retrieve/retrieve.go

+			return true
+		}
+
+		if myPo < depth { //  chunk is NOT within the neighbourhood


Do we actually have unit tests for all this logic?

no and this is not something i'm going to attend. this PR's purpose is not to unit test findPeer (there's a specific ticket for that) but to pull out the retrieve protocol.
That being said, I think that the delivery forwarding test provides better testing than we ever had on the functionality of forwarding. plus it somewhat tests generally the logic in this function. In any case, utilizing that test as a starting point we could quite easily test all of the logic in this function.

nonsense · 2019-07-23T08:01:48Z

network/retrieve/retrieve.go

+	r.mtx.Lock()
+	defer r.mtx.Unlock()
+
+	return r.peers[id]


What would happen if we are concurrently handling 2 messages from the same peer, one of them resulting in an error, which would return from the Run method below, and call removePeer, while the other is still processing a message and then calling getPeer and send on a nil pointer?

I think it is a bit dangerous to assume that a peer is always in this slice, this is why I have added checks in the PSS PR.

I guess you can leave it as is, but keep it at the back of your head in case we ever see nil pointer dereferences on Send.

Related / similar: https://github.com/ethersphere/swarm/pull/1580/files#diff-6109308aa1312b43e9a8d048e2f2d1ffR284

network/retrieve/retrieve.go

janos · 2019-07-23T08:40:14Z

testutil/node.go

+)
+
+// NodeConfigAtPo brute forces a node config to create a node that has an overlay address at the provided po in relation to the given baseaddr
+func NodeConfigAtPo(t *testing.T, baseaddr []byte, po int) *adapters.NodeConfig {


I think that if this package is imported only in test files, it will not be included in the binary.

network/simulation/bucket.go

network/simulation/simulation.go

janos · 2019-07-23T08:48:53Z

network/retrieve/retrieve.go

+		return nil, err
+	}
+
+	protoPeer := r.getPeer(sp.ID())


As @nonsense said in comment above, protoPeer can be nil. Maybe just to explicitly check for that before calling using it?

ok i add a nil check. now the next question is how the nil should be handled. ideally we should not fail the request but maybe goto the line where findPeer is called

I think you should fail the request if getPeer is nil, because the connection is already dropped due to another request/message problem.

if a node requests a chunk from us, why should we fail because another node disconnected?

Oh, that's right. Then I guess it is fine as you have written it.

this is super ugly.

Perform the findPeer call under the peers mutex locked and just use r.peers[..].

Just define findPeer as an iterator so retrieving the protocol peer is part of the iterator.

findPeer(ctx, req, func(*Peer)) (*Peer, error) { })

in fact the send can also error in which case we also need to go further. so 2 is much better.

janos · 2019-07-23T09:50:26Z

network/retrieve/retrieve.go


 	protoPeer := r.getPeer(sp.ID())
+	if protoPeer == nil {
+		goto FINDPEER


Potential endless loop. And looks like a very likely one, to me.

If getPeer returns nil, it means that the connection is dropped and the peer is removed from the collection, so just return an error and that's it.

guys, RequestFromPeer is triggered when we want a chunk from our peers. that is - we want a chunk, or another node asks us for a chunk and we can't deliver so we ask other peers.
the possible race condition that a peer that is returned from the kademlia in r.findPeer then disconnecting before we reach r.getPeer is so remote that i would barely even consider it feasible, also, the potential very likely infinite loop you're talking about means that every subsequent call to findPeer will yield the same race condition.that is very un -likely.
returning an error when this race condition has occured makes no sense to me. why should we fail a delivery forwarding request from a certain peer just because another peer disconnected from us? we should just find another peer to request from.
my 2 cents

@acud you're right, we should just find another suggested peer, as you've written it, I was confused when I wrote my comment, it makes sense now.

My reasoning is with this code:

findPeer returns no error

getPeer returns nil
-> infinite loop

Why findPeer and getPeer would behave like that, It does not matter to me in the context of this function, as with any change to this two functions it is possible, however unlikely it is to happen now. When someone changes this two functions, he/she should be aware of RequestFromPeers implications. If it is possible, my opinion is that it should be handled.

I just wanted to express my opinion. I am deeply sorry if I am frustrating anybody with this. Whatever you think is correct is ok with me.

janos i have addressed this with a limit to how many retries we should do

nonsense · 2019-07-23T10:08:34Z

network/retrieve/retrieve.go

+	log.Debug("retrieval.requestFromPeers", "req.Addr", req.Addr)
+	metrics.GetOrRegisterCounter("network.retrieve.requestfrompeers", nil).Inc(1)
+
+FINDPEER:


No need for FINDPEER.

Actually @acud is right, we need this. Basically this is a case where the suggested peer from findPeer has been disconnected in the meantime, and we just need a new suggested peer.

nonsense · 2019-07-23T11:55:29Z

testutil/node.go

+func NodeConfigAtPo(t *testing.T, baseaddr []byte, po int) *adapters.NodeConfig {
+	foundPo := -1
+	var conf *adapters.NodeConfig
+	for foundPo != po {


This seems to be mixing enode and bzzaddresses. If you want a bzzaddress with a certain po from another address, you can also use:

pot.RandomAddressAt

why? it creates the kademlia base address using all of this enr voodoo and checks it for the po from the supplied baseaddr, which is also coming out the kademlia of the reference node.i dont need just a random address, i need the whole config that is used to generate a new node so that when the node is generated, the correct address is bootstrapped into its kademlia

The RandomAddressAt doesn't generate a random address, but one with a given po - same as what you're doing here.

ENR and Enodes are independent from BzzAddr.

acud · 2019-07-23T12:20:08Z

testutil/node.go

+			t.Fatalf("unable to create enode: %v", err)
+		}
+
+		n := network.NewAddr(nod)


@nonsense here you can see that network.NewAddr is called and that should give us the correct overlay address

yeah, no - network.NewAddr(nod) should not be used as discussed in chat - it is very misleading and broken.

network/retrieve/retrieve.go

network/retrieve/retrieve_test.go

zelig · 2019-07-24T13:53:39Z

network/retrieve/retrieve_test.go

+	return fileStore.GetAllReferences(context.Background(), reader, false)
+}
+
+func getChunks(store chunk.Store) (chunks map[string]struct{}, err error) {


what is this used for? it feels strange you need the subscribePull function here

read the tests and thou shalt understand

network/retrieve/wire.go

network/simulation/bucket.go

network/retrieve/retrieve.go

network/retrieve/retrieve_test.go

storage/localstore/subscription_pull.go

network/retrieve/peer.go

network/retrieve/retrieve.go

zelig · 2019-07-24T21:16:31Z

network/retrieve/retrieve.go

+	return &spID, nil
+}
+
+func (r *Retrieval) Start(server *p2p.Server) error {


no longer convinced that retrieval should be a service. we could just wrap it for simulation testing if needed.
could still keep the Protocols() func or better func (*Retrieval) Protocol() p2p.Protocol {}
but also ok to keep it, just wonder what motivates premature abstraction ;)

..............?
I'm not sure i'm following you. we agreed to implement retrieval as a separate p2p protocol, that's what i did and p2p protocols need to implement node.Service, that's what I did. I don't understand what you want at this stage. please be kind enough to clarify because it seems you want me to scratch this PR? with which alternative? thanks

I am just saying that looking at Start, Stop, APIs, i dont see a compelling case that Retrieval should implement node.Service . Just include Retrieval.Protocol() among the protocols in swarm.go. thats all. But i dont mind

network/retrieve/retrieve_test.go

zelig · 2019-07-24T21:27:35Z

network/retrieve/retrieve_test.go

+// where po(fetching,forwarding) = 1 and po(forwarding,uploading) = 1, then uploads chunks to the uploading node, afterwards
+// tries to retrieve the relevant chunks (ones with po = 0 to fetching i.e. no bits in common with fetching and with
+// po >= 1 with uploading i.e. with 1 bit or more in common with the uploading)
+func TestDeliveryForwarding(t *testing.T) {


having a hard time understanding the generaslity and relevance of this test

what is there not to understand? findPeer needs a certain topology for the retrieve requests to propagate through the forwarding node. i'm not which other explanations you expect and what is exactly unclear? the comment, the code? please be a bit more specific

also, to further clarify, you cannot test retrieval in a snapshot without pull syncing (since due to the logic in findPeer you'll never get to the content unless you build the topology in such a way that every request to every chunk will result in it arriving to the storing node).
so, to test if retrieve requests are being forwarded, a basic topology that adheres to certain POs between nodes is needed, with discovery disabled. This test tests exactly this, just retrieve request forwarding without any other moving parts

network/retrieve/wire.go

acud · 2019-07-25T06:19:51Z

@zelig you're asking me to refactor findPeer by adding more locking and cleaning up the function which is not unit tested and barely integration tested. I'm just not going to do it sorry. The purpose of this PR is not to refactor the code from the old stream package but the introduce the retrieve protocol (which, by the sounds of it, you're no longer convinced should be done). I'm not going into a refactoring round here without this thing hardwired into the codebase and preferably with proper unit and integration test coverage.

* 'master' of github.com:ethersphere/swarm: (54 commits) api, chunk, cmd, shed, storage: add support for pinning content (ethersphere#1509) docs/swarm-guide: cleanup (ethersphere#1620) travis: split jobs into different stages (ethersphere#1615) simulation: retry if we hit a collision on tcp/udp ports (ethersphere#1616) api, chunk: rename Tag.New to Tag.Create (ethersphere#1614) pss: instrumentation and refactor (ethersphere#1580) api, cmd, network: add --disable-auto-connect flag (ethersphere#1576) changelog: fix typo (ethersphere#1605) version: update to v0.4.4 unstable (ethersphere#1603) swarm: release v0.4.3 (ethersphere#1602) network/retrieve: add bzz-retrieve protocol (ethersphere#1589) PoC: Network simulation framework (ethersphere#1555) network: structured output for kademlia table (ethersphere#1586) client: add bzz client, update smoke tests (ethersphere#1582) swarm-smoke: fix check max prox hosts for pull/push sync modes (ethersphere#1578) cmd/swarm: allow using a network interface by name for nat purposes (ethersphere#1557) pss: disable TestForwardBasic (ethersphere#1544) api, network: count chunk deliveries per peer (ethersphere#1534) network/newstream: new stream! protocol base implementation (ethersphere#1500) swarm: fix bzz_info.port when using dynamic port allocation (ethersphere#1537) ...

acud added enhancement stream priority labels Jul 22, 2019

acud requested review from holisticode, janos, nolash, nonsense, skylenet and zelig July 22, 2019 11:40

acud self-assigned this Jul 22, 2019

acud force-pushed the bzz-retrieve branch from 7bb6bea to 4ded17b Compare July 22, 2019 11:53

network/retrieve: initial commit

40f02d0

acud force-pushed the bzz-retrieve branch from 923c979 to 40f02d0 Compare July 22, 2019 12:01

nonsense reviewed Jul 22, 2019

View reviewed changes

network/retrieve/peer.go Outdated Show resolved Hide resolved

nonsense reviewed Jul 22, 2019

View reviewed changes

network/retrieve/retrieve.go Outdated Show resolved Hide resolved

nonsense reviewed Jul 22, 2019

View reviewed changes

network/retrieve/retrieve_test.go Show resolved Hide resolved

nonsense reviewed Jul 22, 2019

View reviewed changes

network/retrieve/wire.go Outdated Show resolved Hide resolved

nonsense reviewed Jul 22, 2019

View reviewed changes

acud added 2 commits July 22, 2019 17:41

network: import dependencies from stream package branch

a97bca1

network/retrieve: address pr comments

51e5cad

acud added the ready for review label Jul 22, 2019

holisticode reviewed Jul 22, 2019

View reviewed changes

network/retrieve: fix pr comments

aeadac1

nonsense reviewed Jul 23, 2019

View reviewed changes

janos reviewed Jul 23, 2019

View reviewed changes

network/retrieve: create logger for peer

be2dd74

acud force-pushed the bzz-retrieve branch from 88e46c3 to be2dd74 Compare July 23, 2019 09:19

network: address pr comments

bfff88c

janos reviewed Jul 23, 2019

View reviewed changes

nonsense reviewed Jul 23, 2019

View reviewed changes

janos previously approved these changes Jul 23, 2019

View reviewed changes

acud commented Jul 23, 2019

View reviewed changes

network/retrieve: address pr comments

7c9660a

acud dismissed janos’s stale review via 7c9660a July 23, 2019 12:28

network/retrieve: fix linter

430625e

acud requested review from janos and nonsense July 24, 2019 08:56

nonsense previously approved these changes Jul 24, 2019

View reviewed changes

zelig suggested changes Jul 24, 2019

View reviewed changes

acud dismissed nonsense’s stale review via 302bf21 July 24, 2019 14:38

acud requested review from nonsense and zelig July 24, 2019 14:39

zelig reviewed Jul 24, 2019

View reviewed changes

storage/localstore/subscription_pull.go Outdated Show resolved Hide resolved

network/retrieve: pr comments

fab9cc1

acud force-pushed the bzz-retrieve branch from 302bf21 to fab9cc1 Compare July 24, 2019 15:16

acud requested a review from zelig July 24, 2019 15:27

nonsense previously approved these changes Jul 24, 2019

View reviewed changes

zelig suggested changes Jul 24, 2019

View reviewed changes

acud dismissed nonsense’s stale review via f014b82 July 25, 2019 06:11

network/retrieval: pr comments

85ea690

acud force-pushed the bzz-retrieve branch from f014b82 to 85ea690 Compare July 25, 2019 06:13

acud requested review from nonsense and zelig July 25, 2019 06:22

zelig approved these changes Jul 25, 2019

View reviewed changes

nonsense approved these changes Jul 25, 2019

View reviewed changes

zelig merged commit 74b12e3 into master Jul 25, 2019

acud deleted the bzz-retrieve branch July 25, 2019 10:09

network/retrieve: add bzz-retrieve protocol #1589

network/retrieve: add bzz-retrieve protocol #1589

Uh oh!

Conversation

acud commented Jul 22, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holisticode left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acud Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acud Jul 23, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acud Jul 23, 2019 •

edited

Loading

acud Jul 23, 2019 •

edited

Loading