-
Notifications
You must be signed in to change notification settings - Fork 110
pss: instrumentation and refactor #1580
Conversation
janos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me, the only thing that is drastic is removal of process method call, which I have nothing agains, but I still lack deeper knowledge of pss.
zelig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not much to change really
most important comment regards eliminating shortcircuit/selfsend
| } | ||
| } | ||
| return nil | ||
| return p.enqueue(pssMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self delivery is indeed a contraversial thing.
For direct send to address it is indeed not needed but if we get rid of it. then prox short circuit test should be removed and the pushsync pusher should be aware of kademlia depth and replicate prox message pickup logic, which we thought was ugly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling messages when a client calls Send is absolutely not intuitive, let alone doing it synchronously.
If you want that kind of functionality, I think there should be another client API that makes this more explicit - if some kind of processing is needed, maybe we should call it pss.Process(msg) or something else.
The Pusher on the pushsync-anton PR is indeed amended to check if it is the closest peer to a chunk. I don't think this responsibility should be delegated to a Send method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Pusher on the pushsync-anton PR is indeed amended to check if it is the closest peer to a chunk
As far as I recall, the deliver to self should happen if the message is in the sender's neighborhood, not only if it's the "closest".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push sync is supposed to sync a chunk to the closest peer (in terms of po), whereas Pull sync is supposed to replicate a chunk to the NN, AFAIK.
So in the meaning of the word sync in Push Sync, we needToSync a chunk if there is a closer peer to the chunk. We can change that if necessary.
At the moment, if our node is responsible for a given chunk (if it is the closest node in terms of po), then a chunk is immediately marked as synced. This could be a problem if we shut down the laptop, as only with Push Sync we wouldn't have replicated all chunks. We would have to rely on Pull Sync to actually sync the streams within the NN.
| } | ||
| } | ||
| return nil | ||
| return p.enqueue(pssMsg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Pusher on the pushsync-anton PR is indeed amended to check if it is the closest peer to a chunk
As far as I recall, the deliver to self should happen if the message is in the sender's neighborhood, not only if it's the "closest".
zelig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The shortcircuit test should be either removed or fixed
i still recommend renaming PssAddress and PssMsg to Address and Msg, at least record it in an issue.
| if err := validateAddress(address); err != nil { | ||
| return err | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variables need to be renamed accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which variables? Half of PSS doesn't have appropriate Godoc and I already renamed most of the variables to be consistent with the naming conventions in Go.
This PR is not about fixing and making PSS adhere to all best practices and naming conventions, but about making PSS performant with at least basic workloads of messages, such as those required by Push Sync when uploading a file bigger than 50MB, and adding basic instrumentation so that we can measure various parts of PSS when running under such loads, as the PR description shows.
janos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* 'master' of github.com:ethersphere/swarm: (54 commits) api, chunk, cmd, shed, storage: add support for pinning content (ethersphere#1509) docs/swarm-guide: cleanup (ethersphere#1620) travis: split jobs into different stages (ethersphere#1615) simulation: retry if we hit a collision on tcp/udp ports (ethersphere#1616) api, chunk: rename Tag.New to Tag.Create (ethersphere#1614) pss: instrumentation and refactor (ethersphere#1580) api, cmd, network: add --disable-auto-connect flag (ethersphere#1576) changelog: fix typo (ethersphere#1605) version: update to v0.4.4 unstable (ethersphere#1603) swarm: release v0.4.3 (ethersphere#1602) network/retrieve: add bzz-retrieve protocol (ethersphere#1589) PoC: Network simulation framework (ethersphere#1555) network: structured output for kademlia table (ethersphere#1586) client: add bzz client, update smoke tests (ethersphere#1582) swarm-smoke: fix check max prox hosts for pull/push sync modes (ethersphere#1578) cmd/swarm: allow using a network interface by name for nat purposes (ethersphere#1557) pss: disable TestForwardBasic (ethersphere#1544) api, network: count chunk deliveries per peer (ethersphere#1534) network/newstream: new stream! protocol base implementation (ethersphere#1500) swarm: fix bzz_info.port when using dynamic port allocation (ethersphere#1537) ...
This PR is the extracted changes and refactors I did during the effort to improve PSS efficiency while working on Push Syncing, as well as the first step towards fixing the issues described at #1574
Changes so far:
send(used isSendAsymandSendSym) andSendRawno longer callprocess. The idea is thatprocessshould be called when an incoming message is received within a node, and not called on outgoing messages. If I usePSSas a client and call one of these methods, I wouldn't expect for the message to be processed right away. To be discussed with @nolash and others interested.pss, since they are not exported and within thepsspackage. PSS should be inferred from the context (or package in this case).Runfunction exits. Previously we were only adding peers to the collection maintained by PSS, but never removed them.grafana-dashboardsrepo, which already visualises the timers and other metrics added here.TODO:
TestProxShortCircuit- to be addressed after PR review and discussion with @nolash on 1.