Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

Conversation

@nonsense
Copy link
Contributor

@nonsense nonsense commented Jul 17, 2019

This PR is the extracted changes and refactors I did during the effort to improve PSS efficiency while working on Push Syncing, as well as the first step towards fixing the issues described at #1574

Changes so far:

  1. send (used is SendAsym and SendSym) and SendRaw no longer call process. The idea is that process should be called when an incoming message is received within a node, and not called on outgoing messages. If I use PSS as a client and call one of these methods, I wouldn't expect for the message to be processed right away. To be discussed with @nolash and others interested.
  2. Instrumentation - added timers to some of the methods so that we get feedback on where we have blocking behaviour and how long it takes to execute certain methods.
  3. Renamed many identifiers to not include the prefix pss, since they are not exported and within the pss package. PSS should be inferred from the context (or package in this case).
  4. Added removal of peers within the PSS protocol, when the Run function exits. Previously we were only adding peers to the collection maintained by PSS, but never removed them.
  5. Added PSS dashboard in the grafana-dashboards repo, which already visualises the timers and other metrics added here.

TODO:

  • review TestProxShortCircuit - to be addressed after PR review and discussion with @nolash on 1.

janos
janos previously approved these changes Jul 17, 2019
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, the only thing that is drastic is removal of process method call, which I have nothing agains, but I still lack deeper knowledge of pss.

Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not much to change really
most important comment regards eliminating shortcircuit/selfsend

}
}
return nil
return p.enqueue(pssMsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self delivery is indeed a contraversial thing.
For direct send to address it is indeed not needed but if we get rid of it. then prox short circuit test should be removed and the pushsync pusher should be aware of kademlia depth and replicate prox message pickup logic, which we thought was ugly...

Copy link
Contributor Author

@nonsense nonsense Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling messages when a client calls Send is absolutely not intuitive, let alone doing it synchronously.

If you want that kind of functionality, I think there should be another client API that makes this more explicit - if some kind of processing is needed, maybe we should call it pss.Process(msg) or something else.

The Pusher on the pushsync-anton PR is indeed amended to check if it is the closest peer to a chunk. I don't think this responsibility should be delegated to a Send method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pusher on the pushsync-anton PR is indeed amended to check if it is the closest peer to a chunk

As far as I recall, the deliver to self should happen if the message is in the sender's neighborhood, not only if it's the "closest".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push sync is supposed to sync a chunk to the closest peer (in terms of po), whereas Pull sync is supposed to replicate a chunk to the NN, AFAIK.

So in the meaning of the word sync in Push Sync, we needToSync a chunk if there is a closer peer to the chunk. We can change that if necessary.

At the moment, if our node is responsible for a given chunk (if it is the closest node in terms of po), then a chunk is immediately marked as synced. This could be a problem if we shut down the laptop, as only with Push Sync we wouldn't have replicated all chunks. We would have to rely on Pull Sync to actually sync the streams within the NN.

}
}
return nil
return p.enqueue(pssMsg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pusher on the pushsync-anton PR is indeed amended to check if it is the closest peer to a chunk

As far as I recall, the deliver to self should happen if the message is in the sender's neighborhood, not only if it's the "closest".

@nonsense
Copy link
Contributor Author

@zelig @nolash I think I addressed comments. Let me know if you want to address something else.

nolash
nolash previously approved these changes Jul 24, 2019
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shortcircuit test should be either removed or fixed
i still recommend renaming PssAddress and PssMsg to Address and Msg, at least record it in an issue.

if err := validateAddress(address); err != nil {
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variables need to be renamed accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which variables? Half of PSS doesn't have appropriate Godoc and I already renamed most of the variables to be consistent with the naming conventions in Go.

This PR is not about fixing and making PSS adhere to all best practices and naming conventions, but about making PSS performant with at least basic workloads of messages, such as those required by Push Sync when uploading a file bigger than 50MB, and adding basic instrumentation so that we can measure various parts of PSS when running under such loads, as the PR description shows.

@nonsense
Copy link
Contributor Author

@zelig @nolash comments have been addressed once again.

Please review and decide if this is good enough, and most importantly better or blocker for master.

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nonsense nonsense merged commit 3be5cf3 into master Jul 29, 2019
@skylenet skylenet added this to the 0.4.4 milestone Aug 9, 2019
chadsr added a commit to chadsr/swarm that referenced this pull request Sep 23, 2019
* 'master' of github.com:ethersphere/swarm: (54 commits)
  api, chunk, cmd, shed, storage: add support for pinning content (ethersphere#1509)
  docs/swarm-guide: cleanup (ethersphere#1620)
  travis: split jobs into different stages (ethersphere#1615)
  simulation: retry if we hit a collision on tcp/udp ports (ethersphere#1616)
  api, chunk: rename Tag.New to Tag.Create (ethersphere#1614)
  pss: instrumentation and refactor (ethersphere#1580)
  api, cmd, network: add --disable-auto-connect flag (ethersphere#1576)
  changelog: fix typo (ethersphere#1605)
  version: update to v0.4.4 unstable (ethersphere#1603)
  swarm: release v0.4.3 (ethersphere#1602)
  network/retrieve: add bzz-retrieve protocol (ethersphere#1589)
  PoC: Network simulation framework (ethersphere#1555)
  network: structured output for kademlia table (ethersphere#1586)
  client: add bzz client, update smoke tests (ethersphere#1582)
  swarm-smoke: fix check max prox hosts for pull/push sync modes (ethersphere#1578)
  cmd/swarm: allow using a network interface by name for nat purposes (ethersphere#1557)
  pss: disable TestForwardBasic (ethersphere#1544)
  api, network: count chunk deliveries per peer (ethersphere#1534)
  network/newstream: new stream! protocol base implementation (ethersphere#1500)
  swarm: fix bzz_info.port when using dynamic port allocation (ethersphere#1537)
  ...
@nonsense nonsense deleted the refactor-pss branch September 30, 2019 10:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants