Skip to content

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Nov 5, 2022

During block construction, sometimes it seems as if multiple block creation processes are started.

INFO [11-04|16:05:37.306] Updated payload number=15,897,733 hash=a863f3..45d6da txs=31 gas=2,707,865 fees=0.008281571686 elapsed=23.508ms
INFO [11-04|16:05:39.339] Updated payload number=15,897,733 hash=262b03..a29f0a txs=52 gas=4,506,728 fees=0.01222167591 elapsed=32.476ms
INFO [11-04|16:05:41.370] Updated payload number=15,897,733 hash=313b93..ec9774 txs=73 gas=6,033,849 fees=0.01827696169 elapsed=30.969ms
INFO [11-04|16:05:43.038] Updated payload number=15,897,733 hash=7dbeb5..020b92 txs=90 gas=8,696,021 fees=0.02980248015 elapsed=37.349ms
INFO [11-04|16:05:43.414] Updated payload number=15,897,733 hash=01f6d0..99551c txs=95 gas=9,736,205 fees=0.0308707919 elapsed=42.617ms
INFO [11-04|16:05:45.084] Updated payload number=15,897,733 hash=256607..906895 txs=115 gas=10,996,281 fees=0.03333384812 elapsed=46.000ms
INFO [11-04|16:05:45.460] Updated payload number=15,897,733 hash=6d14a6..617bf7 txs=121 gas=11,571,749 fees=0.03424049812 elapsed=45.653ms
INFO [11-04|16:05:47.135] Updated payload number=15,897,733 hash=f3af51..55f187 txs=132 gas=12,505,917 fees=0.03664891464 elapsed=49.341ms
INFO [11-04|16:05:47.516] Updated payload number=15,897,733 hash=8f9637..07f4cf txs=134 gas=12,695,609 fees=0.03729100157 elapsed=55.220ms

The timestamps looks as if a second process is kicked off at 16:05:43, where it does two outputs on the same seconds, and keeps doing that until it is done. Also, the block with txs=132 gas=12,505,917 was the one that won out in the end.

This PR makes the log show more clearly what is going on, by adding the payload id to the Updated payload output. However, it may be the case that the two processes are using the same id, since as far as I can tell there's no uniqueness-check in ForkchoiceUpdatedV1 or Miner.BuildPayload. Therefore, I also added a log when the payload work is begun.

With this PR, we should be able to know exactly what happens / what we're told to do by the CL.


This PR now also includes a fix to the problem of mult-routines building blocks on the same input. This PR works as before with regards to stopping the work, but it just will not spin up a second routine if one is already building. So if the CL does N calls to FCU+buildblock, and N calls to GetPayload, only the first of each will do something, the other calls will be mostly no-ops.

@holiman
Copy link
Contributor Author

holiman commented Nov 5, 2022

This PR is now deployed on zingo: Geth/v1.11.0-unstable-a2e53d90-20221105/linux-amd64/go1.18.7

@holiman
Copy link
Contributor Author

holiman commented Nov 6, 2022

[11-06|12:01:36.856] Starting work on payload id=0xf575e344684ff96a
[11-06|12:01:36.948] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=36 gas=1,974,513 fees=0.003540521141 hash=c8a878..ceb663 root=0678ef..d81534 elapsed=88.509ms
[11-06|12:01:38.965] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=53 gas=2,961,393 fees=0.005184357405 hash=24d258..dcdbee root=802878..09986a elapsed=16.458ms
[11-06|12:01:40.987] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=79 gas=4,798,321 fees=0.009972492989 hash=ac0282..31e318 root=87b19b..0e5948 elapsed=21.857ms
[11-06|12:01:43.013] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=105 gas=6,176,559 fees=0.01310068065 hash=a23bd2..8ab953 root=2ea80d..5a7ec2 elapsed=25.765ms
[11-06|12:01:43.015] Starting work on payload id=0xf575e344684ff96a
[11-06|12:01:43.041] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=107 gas=6,335,481 fees=0.01378433024 hash=83a89c..ff6441 root=fa11d5..a22a1e elapsed=25.212ms
[11-06|12:01:45.045] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=126 gas=7,658,424 fees=0.01837337766 hash=e56f14..683941 root=64ad82..1dd62b elapsed=31.052ms
[11-06|12:01:45.078] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=126 gas=7,658,424 fees=0.01837337766 hash=e56f14..683941 root=64ad82..1dd62b elapsed=37.298ms
[11-06|12:01:47.081] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=139 gas=8,629,852 fees=0.0213817106 hash=ef6aae..247ea1 root=8a384f..1a7dcd elapsed=34.893ms
[11-06|12:01:47.117] Updated payload id=0xf575e344684ff96a number=15,910,845 txs=139 gas=8,629,852 fees=0.0213817106 hash=ef6aae..247ea1 root=8a384f..1a7dcd elapsed=38.770ms

So lighthouse does an "FCU + Start building" twice, once at 12:01:36 and once at 12:01:43. With the identical parameters, so the payload id becomes the same. We don't actually check "are we already at work on this payload id", just spin up a second routine. Is that intended behaviour?

cc @MariusVanDerWijden @rjl493456442

@rjl493456442
Copy link
Member

@holiman It feels like lightHouse has the similar "recommit" mechanism which re-create two blocks and pick the higher profit one.

I think we can either add a random salt for computing payload ID ( to make is unique ), or reject the creation with same payload in the first place.

Honestly I think we should figure out the reason CL designs this kind of behavior.

@rjl493456442
Copy link
Member

rjl493456442 commented Nov 7, 2022

@holiman @MariusVanDerWijden

I chatted with @michaelsproul from sigp, here is the information I got

Block building procedure

Let's assume we are the validator who is responsible for producing block X in slot M

  • Usually block X-1 will be propagated to us in the first 4s of slot M-1
  • LH fires an event to us, we start to build block X
  • When slot M starts, LH retrieves payload back as soon as possible
  • In another word, usually we have 8s to build a block in total

In slot M:

  • 0 seconds: proposer builds a block, signs it, broadcasts it
  • 4 seconds: 1/32 of validators attest to the block (in committees) and broadcast these amongst their committees
  • 8 seconds: a smaller subset of attesters create aggregate attestations and broadcast them

Multiple block build requests with same parameters

It's an internal logic in LH that it will send forkchoiceUpdated whenever it's re-evaluating fork choice (which is a few times per slot).

And also it will always send the payload attributes whenever it's sending forkchoiceUpdated if we are inturn to build block.

So I think the conclusion we can get here is:

  • we should somehow ignore the duplicated block build requests, return the valid response with payload id attached, but don't start multiple thread for building block in background
  • we should also handle the duplicated forkchoiceUpdated events properly(I guess it's already handled)

case <-payload.stop:
default:
close(payload.stop)
// close(payload.stop)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you not stop payload building on resolve?
I think the correct way according to spec is to return the payload once we get a GetPayload call and return the same payload on all subsequent calls

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's a very good point actually. It might lead to some troubles that validator produces different blocks in a single slot?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, really? So GetPayload needs to terminate the building. I can change it so it works like that, but could you elaborate on the downsides of not doing that? Just so I have the full picture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the call non-deterministic and a bit racy. Suppose you call GetPayload twice by accident and get two different results. Now you need to decide which of those two completely valid payloads you want to use

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think it is kind of an arbitrary decision, but its part of the spec afair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes the call non-deterministic and a bit racy. Suppose you call GetPayload twice by accident and get two different results

That's already in the design, isn't it? You first get an empty block, then you get a non-empty.

arbitrary decision, but its part of the spec afair

Ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While, it's different. Recommit just updates the payload internally, all generated payloads are not available from the CL side. But if we don't stop building after retrieval, then different versions payload may all be accessible for CL.

Copy link
Member

@MariusVanDerWijden MariusVanDerWijden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@holiman
Copy link
Contributor Author

holiman commented Nov 7, 2022

52f3101 running in prod now

@holiman
Copy link
Contributor Author

holiman commented Nov 7, 2022

[11-07|13:07:00.574] Starting work on payload id=0xbf9bb34ac80de05f
[11-07|13:07:00.586] Updated payload id=0xbf9bb34ac80de05f number=15,918,340 hash=f95caa..b3fd7b txs=14 gas=1,306,666 fees=0.002417653581 root=709355..678827 elapsed=11.385ms
[11-07|13:07:02.602] Updated payload id=0xbf9bb34ac80de05f number=15,918,340 hash=15c693..ad6a84 txs=41 gas=3,723,998 fees=0.02112115165 root=d1758e..d5d0a8 elapsed=15.333ms
[11-07|13:07:04.625] Updated payload id=0xbf9bb34ac80de05f number=15,918,340 hash=b646f1..1a29c6 txs=55 gas=4,718,736 fees=0.02290979913 root=805638..9c73d7 elapsed=22.695ms
[11-07|13:07:06.653] Updated payload id=0xbf9bb34ac80de05f number=15,918,340 hash=a5d6e1..e86434 txs=80 gas=6,763,970 fees=0.03296050812 root=66ad5a..bafd0a elapsed=27.616ms
[11-07|13:07:08.688] Updated payload id=0xbf9bb34ac80de05f number=15,918,340 hash=b707cf..44fd92 txs=102 gas=8,657,461 fees=0.03818184754 root=ddad48..3270c9 elapsed=33.980ms
[11-07|13:07:10.737] Updated payload id=0xbf9bb34ac80de05f number=15,918,340 hash=7c4a78..99f30b txs=114 gas=9,228,008 fees=0.04027533777 root=fdaa00..a5b30b elapsed=48.847ms
[11-07|13:07:11.157] Stopping work on payload id=0xbf9bb34ac80de05f reason=delivery

It had nearly 11s to produce the block! That's pretty strange/surprising, IMO. @michaelsproul any thoughts on why we had so much time?

@holiman holiman changed the title eth/catalyst, miner: show payload id in logs eth/catalyst, miner: deduplicate identical work + show payload id Nov 7, 2022
@holiman holiman added this to the 1.11.0 milestone Nov 7, 2022
@holiman holiman merged commit ca948b8 into ethereum:master Nov 7, 2022
@michaelsproul
Copy link

michaelsproul commented Jan 10, 2023

Hi @holiman, sorry, I missed that last comment somehow!

For the block you posted it looks like the parent block arrived quite promptly, so Lighthouse would have sent an fcU with payload attributes as soon as the parent finished processing. My node shows the parent arriving after 1.8s and then finishing processing around 2s, which gives Geth 10s + a bit more at the start of the next slot.

In general is it advantageous to have much time beyond 2-4 seconds? It seems that each of Geth's payload improvement iterations is pretty quick, so I would think there's not much benefit to having 11s vs 4s?

We recently removed the immediate fcU with payload attributes call from Lighthouse, and will now wait for 4s before the next slot. This change was made in a PR (sigp/lighthouse#2860) that took us a while to merge and release, but will be going out in Lighthouse v3.4.0 shortly. The thinking was to make Lighthouse more profitable by default with Geth 1.10.x, and also to make the lookahead configurable via a new flag --prepare-payload-lookahead:

--prepare-payload-lookahead <MILLISECONDS>
    The time before the start of a proposal slot at which payload attributes should be sent. Low values are
    useful for execution nodes which don't improve their payload after the first call, and high values are
    useful for ensuring the EL is given ample notice. Default: 1/3 of a slot.

shekhirin pushed a commit to shekhirin/go-ethereum that referenced this pull request Jun 6, 2023
This PR now also includes a fix to the problem of mult-routines building blocks on the same input. This PR works as before with regards to stopping the work, but it just will not spin up a second routine if one is already building. So if the CL does N calls to FCU+buildblock, and N calls to GetPayload, only the first of each will do something, the other calls will be mostly no-ops.

This PR also adds printout of the payload id into the logs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants