Skip to content

SSV use lighthouse v5.0.0, proposal block miss. #5365

@hwhe

Description

@hwhe

Description

My usage scenario as follows:
4 ssv operators use the same beacon node (use lighthouse client). After I upgraded the lighthouse version from v4.4.1 to v5.0.0, the block miss rate increased.

I check the code and find a lock is added in the #4925 issue. The four get validator/blinded_blocks/{slot} requests for creating blocks are serialized. As a result, the time required for the consensus of the four ssv operators increases, and the block generation time becomes slow.

 ```
       );
            (re_org_state.pre_state, re_org_state.state_root)
        }
        // Normal case: proposing a block atop the current head. Use the snapshot cache.
        // Normal case: proposing a block atop the current head using the cache.
        else if let Some((_, cached_state)) = self
            .block_production_state
            .lock()
            .take()
            .filter(|(cached_block_root, _)| *cached_block_root == head_block_root)
        {
            (cached_state.pre_state, cached_state.state_root)
        }
        // Fall back to a direct read of the snapshot cache.
        else if let Some(pre_state) = self
            .snapshot_cache
            .try_read_for(BLOCK_PROCESSING_CACHE_LOCK_TIMEOUT)
            .and_then(|snapshot_cache| {
                snapshot_cache.get_state_for_block_production(head_block_root)
            })
        {
            warn!(
                self.log,
                "Block production cache miss";
                "message" => "falling back to snapshot cache clone",
                "slot" => slot
            );
            (pre_state.pre_state, pre_state.state_root)
        } else {

    ```
/// State with complete tree hash cache, ready for block production.
    ///
    /// NB: We can delete this once we have tree-states.
    #[allow(clippy::type_complexity)]
    pub block_production_state: Arc<Mutex<Option<(Hash256, BlockProductionPreState<T::EthSpec>)>>>,

the V4.4.1 log as shown below, We can check "requesting blinded header from connected builder" and see that the requests of the four ssv operators are basically the same.

27:48.356 INFO Requesting blinded header from connected builder,
27:48.399 INFO Requesting blinded header from connected builder,
27:48.415 WARN Duplicate payload cached, this might indicate red
27:48.440 INFO Requesting blinded header from connected builder,
27:48.456 WARN Duplicate payload cached, this might indicate red
27:48.464 INFO Requesting blinded header from connected builder,
27:48.479 WARN Duplicate payload cached, this might indicate red
27:49.326 INFO Requested blinded execution payload parent_ha
27:49.326 INFO Received local and builder payloads parent_ha
27:49.326 INFO Relay block is more profitable than local block,
27:49.352 INFO Requested blinded execution payload parent_ha
27:49.352 INFO Received local and builder payloads parent_ha
27:49.352 INFO Relay block is more profitable than local block,
27:49.402 INFO Requested blinded execution payload parent_ha
27:49.402 INFO Received local and builder payloads parent_ha
27:49.402 INFO Relay block is more profitable than local block,
27:49.416 INFO Requested blinded execution payload parent_ha
27:49.416 INFO Received local and builder payloads parent_ha
27:49.416 INFO Relay block is more profitable than local block,
27:49.719 ERRO Block broadcast was delayed root: ***
27:49.719 ERRO Block broadcast was delayed root: ***
27:49.720 ERRO Block broadcast was delayed root: ***
27:49.822 ERRO Block broadcast was delayed root: ***
27:50.629 INFO New block received root: ***
27:52.583 INFO Builder successfully revealed payload parent_ha
27:52.583 INFO Successfully published a block to the builder net
27:52.603 WARN Error processing HTTP API request method: POST, path: /eth/v1/beacon/blinded_blocks, status: 202 Accepted, elapsed: 2.894625054s

the V5.0.0 log as shown below,We can see that the time difference between the four requests is large.

45:35.232 INFO Requesting blinded header from connected
45:35.940 WARN Block production cache miss
45:35.960 INFO Requesting blinded header from connected
45:35.964 WARN Duplicate payload cached, this might ind
45:36.191 INFO Requested blinded execution payload
45:36.192 INFO Received local and builder payloads
45:36.193 INFO Relay block is more profitable than loca
45:36.672 WARN Block production cache miss
45:36.692 INFO Requesting blinded header from connected
45:36.696 WARN Duplicate payload cached, this might ind
45:36.912 INFO Requested blinded execution payload
45:36.912 INFO Received local and builder payloads
45:36.915 INFO Relay block is more profitable than loca
45:37.514 WARN Block production cache miss
45:37.534 INFO Requesting blinded header from connected
45:37.538 WARN Duplicate payload cached, this might ind
45:37.663 INFO Requested blinded execution payload
45:37.663 INFO Received local and builder payloads
45:37.670 INFO Relay block is more profitable than loca
45:38.485 INFO Requested blinded execution payload
45:38.485 INFO Received local and builder payloads
45:38.495 INFO Relay block is more profitable than loca
45:39.877 ERRO Block was broadcast too late
45:39.878 ERRO Block was broadcast too late
45:39.878 ERRO Block was broadcast too late
45:40.860 WARN Block production cache miss
45:41.003 INFO Synced
45:41.474 ERRO Block was broadcast too late
45:45.109 WARN Builder failed to reveal payload
45:45.109 WARN Error processing HTTP API request
45:45.131 WARN Builder failed to reveal payload
45:45.131 WARN Error processing HTTP API request
45:45.168 WARN Builder failed to reveal payload
45:45.168 WARN Error processing HTTP API request
45:46.671 WARN Builder failed to reveal payload
45:46.671 WARN Error processing HTTP API request
45:48.388 INFO New block received
45:53.001 INFO Synced

ths ssv proposal block as follows
// executeDuty steps:
// 1) sign a partial randao sig and wait for 2f+1 partial sigs from peers
// 2) reconstruct randao and send GetBeaconBlock to BN
// 3) start consensus on duty + block data
// 4) Once consensus decides, sign partial block and broadcast
// 5) collect 2f+1 partial sigs, reconstruct and broadcast valid block sig to the BN

Because the step 2 is slow, the consensus is slow, and finally the block is missed.

Version

version: Lighthouse/v5.0.0

Present Behaviour

The previous version V4.4.1 was fine.

Expected Behaviour

The performance of concurrent requests for obtaining blocks should not be reduced. Consider the ssv scenario.

Steps to resolve

i don't know, maybe consider read-write locks or other ways

Metadata

Metadata

Assignees

No one assigned

    Labels

    HTTP-APIdvtDistributed validator technology e.g. SSV, OboloptimizationSomething to make Lighthouse run more efficiently.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions