-
Notifications
You must be signed in to change notification settings - Fork 21.4k
eth/catalyst, miner: deduplicate identical work + show payload id #26115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cfed224
to
b01e80d
Compare
b01e80d
to
a2e53d9
Compare
This PR is now deployed on |
So |
@holiman It feels like I think we can either add a random salt for computing payload ID ( to make is unique ), or reject the creation with same payload in the first place. Honestly I think we should figure out the reason CL designs this kind of behavior. |
I chatted with @michaelsproul from sigp, here is the information I got Block building procedureLet's assume we are the validator who is responsible for producing block X in slot M
In slot M:
Multiple block build requests with same parametersIt's an internal logic in LH that it will send And also it will always send the payload attributes whenever it's sending So I think the conclusion we can get here is:
|
miner/payload_building.go
Outdated
case <-payload.stop: | ||
default: | ||
close(payload.stop) | ||
// close(payload.stop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you not stop payload building on resolve?
I think the correct way according to spec is to return the payload once we get a GetPayload
call and return the same payload on all subsequent calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it's a very good point actually. It might lead to some troubles that validator produces different blocks in a single slot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, really? So GetPayload
needs to terminate the building. I can change it so it works like that, but could you elaborate on the downsides of not doing that? Just so I have the full picture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes the call non-deterministic and a bit racy. Suppose you call GetPayload
twice by accident and get two different results. Now you need to decide which of those two completely valid payloads you want to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think it is kind of an arbitrary decision, but its part of the spec afair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes the call non-deterministic and a bit racy. Suppose you call GetPayload twice by accident and get two different results
That's already in the design, isn't it? You first get an empty block, then you get a non-empty.
arbitrary decision, but its part of the spec afair
Ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While, it's different. Recommit just updates the payload internally, all generated payloads are not available from the CL side. But if we don't stop building after retrieval, then different versions payload may all be accessible for CL.
cf65f26
to
52f3101
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
52f3101 running in prod now |
It had nearly |
Hi @holiman, sorry, I missed that last comment somehow! For the block you posted it looks like the parent block arrived quite promptly, so Lighthouse would have sent an fcU with payload attributes as soon as the parent finished processing. My node shows the parent arriving after 1.8s and then finishing processing around 2s, which gives Geth 10s + a bit more at the start of the next slot. In general is it advantageous to have much time beyond 2-4 seconds? It seems that each of Geth's payload improvement iterations is pretty quick, so I would think there's not much benefit to having 11s vs 4s? We recently removed the immediate fcU with payload attributes call from Lighthouse, and will now wait for 4s before the next slot. This change was made in a PR (sigp/lighthouse#2860) that took us a while to merge and release, but will be going out in Lighthouse v3.4.0 shortly. The thinking was to make Lighthouse more profitable by default with Geth 1.10.x, and also to make the lookahead configurable via a new flag
|
This PR now also includes a fix to the problem of mult-routines building blocks on the same input. This PR works as before with regards to stopping the work, but it just will not spin up a second routine if one is already building. So if the CL does N calls to FCU+buildblock, and N calls to GetPayload, only the first of each will do something, the other calls will be mostly no-ops. This PR also adds printout of the payload id into the logs.
During block construction, sometimes it seems as if multiple block creation processes are started.
The timestamps looks as if a second process is kicked off at
16:05:43
, where it does two outputs on the same seconds, and keeps doing that until it is done. Also, the block withtxs=132 gas=12,505,917
was the one that won out in the end.This PR makes the log show more clearly what is going on, by adding the payload id to the
Updated payload
output. However, it may be the case that the two processes are using the same id, since as far as I can tell there's no uniqueness-check inForkchoiceUpdatedV1
orMiner.BuildPayload
. Therefore, I also added a log when the payload work is begun.With this PR, we should be able to know exactly what happens / what we're told to do by the CL.
This PR now also includes a fix to the problem of mult-routines building blocks on the same input. This PR works as before with regards to stopping the work, but it just will not spin up a second routine if one is already building. So if the CL does
N
calls toFCU+buildblock
, andN
calls toGetPayload
, only the first of each will do something, the other calls will be mostly no-ops.