Skip to content

Conversation

@rjl493456442
Copy link
Member

@rjl493456442 rjl493456442 commented May 8, 2020

This PR offers two commands geth snapshot prune-state and geth snapshot verify-state.
All these two commands require a live snapshot. If you want to use it, generate the snapshot
first with --snapshot enabled.

State pruner

It's a very simple state pruner. The idea is quite straightforward. Whenever we have a live snapshot, we can re-generate the whole state trie from it. Users can pick a specific version snapshot for state regeneration and then wipe all other trie nodes for pruning.

The procedure of pruning can be the following steps:

  • Generate the state trie of a specific snapshot
  • Commit all trie nodes as well as the account codes to a file-based temporary database
  • Iterate the main database, delete all state data(includes the account codes), but the genesis state should be kept.
  • Compact the whole main db to release disk space
  • Migrate all data from the temporary database to main one

These following scenarios can happen:

  • Before we regenerate the state trie, the system exits: In this case, the temporary database is incomplete and will be deleted when the system launches next time.
  • After we regenerate the state trie, the system exits: In this case, the temporary database is complete. And the more important thing is: we may already prune state data from the main database. So when the system launches next time, the temporary database will be migrated to the main db and wipe it.

Something important:

  • The whole pruning procedure can take a very long time(e.g. 7 minutes on goerli)

State verifier

Verifier is a simple tool to verify whether the regenerated state trie hash is equal with the original one. Now it's mainly used for testing.

What's more?

Except these two commands, we can offer more functionalities based on the snapshot. One idea is the snapshot can be used to generate arbitrary range trie nodes. If so, we can use it for state repairing. Whenever we hit the trie node missing error, we can just regenerate it instead of throwing out the whole db.

Comment on lines +364 to +366
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to return an error instead of just exiting on errors? (re this location and all others)

@rjl493456442
Copy link
Member Author

Running the pruning on the benchmark machine 01;

  • Database size: 284G
    • Ancient: 148G
    • Leveldb: 136G
  • Running time: 3h56m10.068s
    • Generate state from snapshot: 59m13.572s
    • Prune leveldb: 29m9.624s
    • Compact after pruning: 46m55.942s
    • Migrate the generated state: 1h40m47.876s

@rjl493456442 rjl493456442 force-pushed the simple-pruner branch 2 times, most recently from accad99 to 3fe484c Compare October 14, 2020 09:49
@holiman
Copy link
Contributor

holiman commented Oct 15, 2020

As I understand it, this is the current scheme:

  1. Iterate snasphots,
    • pipe them into stacktrie,
      • write states (key+value) into filedb,
  2. iterate leveldb
    • delete states from leveldb
    • range-compact of leveldb
  3. Write back states into leveldb
  4. Delete filedb

This scheme basically empties most of leveldb out, and writes everything back again, which is very heavy IO.
An alternative variant would be to operate on hashes only, and not delete-then-write-back.

  1. Iterate snasphots,
    • pipe them into stacktrie,
      • write keys (hashes) into filedb,
  2. iterate leveldb
    • delete state if key is not present in filedb

Size of state-keys, if we're assuming that the unpruned state is around 2x, with 1G keys: 1G * 32bytes = 32Gb.
So in the second step, we essentially have to find out whether
a given key (32 bytes) is present among 1000000000 other keys.

EDIT: the key size is really the size of to-be-kept, so not the unpruned size. It won't be 1G keys, but somewhere on the order of 600M on mainnet right now

However, we don't actually have to be fully accurate. If we use a bloom, which has
an error rate of N, it just means our deletion will fail to delete N % of the entries.

As far as I can tell, a bloom filter of ~1.84Gb, with 11 filter functions, would give us an
error rate of 0.05%: https://hur.st/bloomfilter/?n=1000000000&p=0.0005&m=&k=

So we'd wind up with:

  1. Iterate snasphots,
    • pipe them into stacktrie,
      • write keys into bloom filter,
  2. iterate leveldb
    • delete state if key is not present in filter
  3. Compact

It's a very different approach, I'd be curioous to see the performance difference between these two ways if doing it.
We incidentally have a bloom filter for the same purpose (trie.SyncBloom), more or less, that we use in the downloader, when downloading state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants