-
Notifications
You must be signed in to change notification settings - Fork 21.4k
core/state/snapshot: faster snapshot generation #22504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I agree that it doesn't make a whole lot of difference to use stackTrie for small tries (although, there's still some difference), but the even larger upside is that it saves us from even resolving the trie in the first place. |
|
Generation completed on |
|
This one now needs a rebase |
|
Rebased! |
| data := SlimAccountRLP(acc.Nonce, acc.Balance, acc.Root, acc.CodeHash) | ||
|
|
||
| // If the account is not yet in-progress, write it out | ||
| if accMarker == nil || !bytes.Equal(accountHash[:], accMarker) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think about this a bit wrt continuations if the account changes while the generator is suspended.
| accTrie, err := trie.NewSecure(dl.root, dl.triedb) | ||
| if err != nil { | ||
| // The account trie is missing (GC), surf the chain until one becomes available | ||
| stats.Log("Trie missing, state snapshotting paused", dl.root, dl.genMarker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure this scenario is handled correctly (Geth is restarted and the pruned tries are missing). Your code might still handle it correctly, but I think it's useful to have a more user friendly error message for it (also separates the expected missing root error from unexpected missing trie node errors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this scenario is still handled. In the new generator, the error returned by generateRange will abort the generation and wait the external signal to resume the generation.
There are two
generateRangecalled, one for the account and another for the storages. The error returned by storage'sgenerateRangewill just be propagated to the outer call stack. So eventually all the internal errors will be handled by account'sgenerateRange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The procedure it aborted, either by external signal or internal error
if err != nil {
if abort == nil { // aborted by internal error, wait the signal
abort = <-dl.genAbort
}
abort <- stats
return
}
core/state/snapshot/generate.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure - can .Value be called twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WCGW
core/state/snapshot/snapshot.go
Outdated
| // If the base layer is generating, abort it and save | ||
| if layer.genAbort != nil { | ||
| abort := make(chan *generatorStats) | ||
| abort := make(chan *generatorStats, 1) // Discard the stats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good idea? Previously we waited until the running generator actually stopped. Here we only wait until it gets its signal and immediately go ahead executing before the generator had a chance to stop. Why was the previous one bad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. Discard the stats is fine, but "not even wait for the stats" is a different thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namely this path https://github.com/ethereum/go-ethereum/pull/22504/files#diff-83ae0dd12996e07452e863cf8b072366443d6c6fbb0fff341399c9c8ce5fe050R551 will do a whole lot of shuffling and locking before it returns.
|
This version of geth has freezed for me. I only have basic logs, see the |
This PR improves snapshot generation (post-snap-sync) by orders of magnitude.
How snapshot generation works now
When doing a snap sync, we download 'slices' of the trie, individually verified, and use these slices to fill our trie.
The slices of accounts/storage themselves are forgotten.
After this is done, we iterate the entire account trie, and every storage trie, to reconstruct the snapshot database.
This process takes hundreds of hours, and the disk IO is extreme during this process.
With this PR
This PR implements the basic idea described here: https://gist.github.com/rjl493456442/85832a0c760f2bafe2a69e33efe68c60 .
For starters, it stores the snapshots into the database. The account (and storage) data do not actually match up to whatever
root we'll eventually wind up on, but that's for later.
After the snap sync is done, we start generating the snapshot, as before. But this time, we have some (potentially stale) data already laying there.
We proceed in batches (ranges). In this PR, the range-size for accounts is
200, and the range-size for storage is1024.For any given range, we do
0x00...untouched) (but also check the storage)updated) (and check the storage)created). (and check the storage).The same algo is used for storage tries. The idea being that most 99% of all ranges are fine, and that out of the ranges which are not fine, most individual elements are fine.
This improves the IO performance of generation by orders of magnitude.
Some other optimizations:
1024), where we can load the entire range into memory, we can do the verification against the expectedrootusing the
stackTrie. Just feed everything in there, and have it spit out the root, which we can then compare. This means that we don't have to resolvethe trie for that storage root at all.
Experimental results (mainnet)
During the process, it read
1.5TBfrom disk/leveldb (same same). It wrote22GBin the first 15 minutes (compaction?), and ended up totalling24Gbleveldb writes,35Gbdisk writes.