Idea: I/O queue to accelerate block processing

## Description

In https://github.com/sigp/lighthouse/pull/2833 we reduce the frequency with which full states are stored in the hot database. However, this just works around the underlying issue that database writes take substantial time for full states.

In lieu of more drastic database restructuring I think we might be able to take the database write time off the critical path by serializing all our I/O and completing it on a dedicated background thread. We want to avoid a situation where out of order writes violate an invariant of the database like `block in db --> block's state in db` or `block in fork choice --> block in db`. I think we're in a good position to guarantee this by hooking [`HotColdDB::do_atomically`](https://github.com/sigp/lighthouse/blob/fff01b24ddedcd54486e374460855ca20d3dd232/beacon_node/store/src/hot_cold_store.rs#L556-L584) to run in the background. For example during block processing we would push the storage ops for the state and block in a single batch to the background thread. Later we may push fork choice to the background thread in a separate batch. Because `do_atomically` serializes requests (completes them in order), there's no way for the fork choice write to commit before the block/state write. In case of a crash (or shutdown) any incomplete I/O ops will just get dropped and the on-disk database will revert to whatever was most recently written successfully.

The key part of this scheme is a background thread within the store which keeps a queue of `Vec<StoreOp<E>>` for completion. We should bound the size of this queue to keep memory usage under control in case of I/O saturation (at which point we block and performance returns to what it is currently). The other important thing to keep in mind is that writes need to be _observable_ by other threads as soon as `do_atomically` returns. In order to achieve this I think we can cache the to-be-written blocks and states in memory, and return them from `get_block`, `get_state`. Other writes are trickier to make observable, because we just see generic key -> value mappings. We could limit the background writing to just apply to batches of blocks and states, and continue blocking for every other write (clearing the pending block/state queue before doing so). _Or_ we could push the I/O queuing down a level into the key-value store, so that it caches the raw `key -> value` mappings in memory (à la `MemoryStore`) and the higher-level DB code doesn't need to change... This may actually be cleanest + most generic :thinking: Potential downsides of the KV-queuing approach are:

- We pay a serialization/deserialization cost for writes/reads because the KV-store caches the bytes in memory rather than the objects.
- We can't take advantage of in-memory de-duplication of `BeaconState`s (if #2806 is implemented).
- We duplicate what fast KV-stores try to do anyway: write to memory (mem-mapped file) first and flush to disk later (on eviction from OS page cache). This isn't so much of a downside, as actually switching the BN's KV-store to MDBX would be a lot of work and require a breaking schema change (_unlike_ a queue on top of LevelDB).

There are also potentially other issues with synchronising the cold DB and hot DB during migrations. We may need to block in such cases.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Idea: I/O queue to accelerate block processing #2844

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Idea: I/O queue to accelerate block processing #2844

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions