Skip to content

Conversation

@AnnaShaleva
Copy link
Member

@AnnaShaleva AnnaShaleva commented Feb 7, 2024

Merge Geth master updates starting from 766272f ending by 99dc3fe. Close #116.

@roman-khimov, the last several commits contain corresponding dBFT changes that need to be done after this merge. I've carefully reviewd the list of changes that happened since the last sync, so here's the list of PRs that might be important for us:

TODO:

  • Fix building workflow
  • Create a separate PR for master branch update (?)
  • Decide whether we need to sync to the current master or to the latest stable revision (v1.13.11).

tactical-retreat and others added 30 commits October 4, 2023 12:38
This change refactors stacktrie to separate the stacktrie itself from the
internal representation of nodes: a stacktrie is not a recursive structure
of stacktries, rather, a framework for representing and operating upon a set of nodes.

---------

Co-authored-by: Gary Rong <[email protected]>
fixes various typos in core
* eth/ethconfig: fix typo on comment

* params/config: fix typo on comment

* eth/ethconfig: fix typo on comment
This is a minor refactor in preparation of changes to range verifier. This PR contains no intentional functional changes but moves (and renames) the light.NodeSet
* fix a typo

* trie: additional fixes to docstrings

---------

Co-authored-by: Martin Holst Swende <[email protected]>
…ocol (#28261)

* eth: enforce announcement metadatas and drop peers violating the protocol

* eth/fetcher: relax eth/68 validation a bit for flakey clients

* tests/fuzzers/txfetcher: pull in suggestion from Marius

* eth/fetcher: add tests for peer dropping

* eth/fetcher: linter linter linter linter linter
This change
  - Removes the owner-notion from a stacktrie; the owner is only ever needed for comitting to the database, but the commit-function, the `writeFn` is provided by the caller, so the caller can just set the owner into the `writeFn` instead of having it passed through the stacktrie.
  - Removes the `encoding.BinaryMarshaler`/`encoding.BinaryUnmarshaler` interface from stacktrie. We're not using it, and it is doubtful whether anyone downstream is either.
* eth/fetcher: throttle tx fetches to 128KB responses

* eth/fetcher: unindent a clause per review request
* cmd, core: resolve scheme from a read-write database

* cmd, core, eth: move the scheme check in the ethereum constructor

* cmd/geth: dump should in ro mode

* cmd: reverts
…306)

This change addresses an issue in snap sync, specifically when the entire sync process can be halted due to an encountered empty storage range.

Currently, on the snap sync client side, the response to an empty (partial) storage range is discarded as a non-delivery. However, this response can be a valid response, when the particular range requested does not contain any slots.

For instance, consider a large contract where the entire key space is divided into 16 chunks, and there are no available slots in the last chunk [0xf] -> [end]. When the node receives a request for this particular range, the response includes:

    The proof with origin [0xf]
    A nil storage slot set

If we simply discard this response, the finalization of the last range will be skipped, halting the entire sync process indefinitely. The test case TestSyncWithUnevenStorage can reproduce the scenario described above.

In addition, this change also defines the common variables MaxAddress and MaxHash.
* build: upgrade to golang 1.21.2

* build: verify checksums via tool

* deps: upgrade go to 1.21.3

* build: move more build metadata into checksum file

* build: move gobootsrc to checksums
During snap-sync, we request ranges of values: either a range of accounts or a range of storage values. For any large trie, e.g. the main account trie or a large storage trie, we cannot fetch everything at once.

Short version; we split it up and request in multiple stages. To do so, we use an origin field, to say "Give me all storage key/values where key > 0x20000000000000000". When the server fulfils this, the server provides the first key after origin, let's say 0x2e030000000000000 -- never providing the exact origin. However, the client-side needs to be able to verify that the 0x2e03.. indeed is the first one after 0x2000.., and therefore the attached proof concerns the origin, not the first key.

So, short-short version: the left-hand side of the proof relates to the origin, and is free-standing from the first leaf.

On the other hand, (pun intended), the right-hand side, there's no such 'gap' between "along what path does the proof walk" and the last provided leaf. The proof must prove the last element (unless there are no elements).

Therefore, we can simplify the semantics for trie.VerifyRangeProof by removing an argument. This doesn't make much difference in practice, but makes it so that we can remove some tests. The reason I am raising this is that the upcoming stacktrie-based verifier does not support such fancy features as standalone right-hand borders.
* cmd, core, ethdb: enable Pebble on 32 bits and OpenBSD too

* ethdb/pebble: use Pebble's internal constant calculation
Updates execution-spec-tests to 1.0.5: https://github.com/ethereum/execution-spec-tests/releases/tag/v1.0.5, switching to develop which contains Cancun tests (which are also enabled in this change).
MariusVanDerWijden and others added 15 commits January 20, 2024 16:03
This change enables Cancun 

- Sepolia at 1706655072 (Jan 31st, 2024)
- Holesky at 1707305664 (Feb 7th, 2024)

Specification: ethereum/execution-specs#860
This change simplifies the logic for indexing transactions and enhances the UX when transaction is not found by returning more information to users.

Transaction indexing is now considered as a part of the initial sync, and `eth.syncing` will thus be `true` if transaction indexing is not yet finished. API consumers can use the syncing status to determine if the node is ready to serve users.
This change makes use of uin256 to represent balance in state. It touches primarily upon statedb, stateobject and state processing, trying to avoid changes in transaction pools, core types, rpc and tracers.
…ace (#28849)

This change switches from using the `Hasher` interface to add/query the bloomfilter to implementing it as methods.
This significantly reduces the allocations for Search and Rebloom.
… param checks (#28230)

 This PR introduces a few changes with respect to payload verification in fcu and new payload requests:

* First of all, it undoes the `verifyPayloadAttributes(..)` simplification I attempted in #27872. 
* Adds timestamp validation to fcu payload attributes [as required](https://github.com/ethereum/execution-apis/blob/main/src/engine/cancun.md#specification-1) (section 2) by the Engine API spec. 
* For the new payload methods, I also update the verification of the executable data. For `newPayloadV2`, it does not currently ensure that cancun values are `nil`. Which could make it possible to submit cancun payloads through it. 
* On `newPayloadV3` the same types of checks are added. All shanghai and cancun related fields in the executable data must be non-nil, with the addition that the timestamp is _only_ with cancun.
* Finally it updates a newly failing catalyst test to call the correct fcu and new payload methods depending on the fork.
…d reset (#28837)

This PR fixes an issues in the new simulated backend. The root cause is the fact that the transaction pool has an internal reset operation that runs on a background thread.

When a new transaction is added to the pool via the RPC, the transaction is added to a non-executable queue and will be moved to its final location on a background thread. If the machine is overloaded (or simply due to timing issues), it can happen that the simulated backend will try to produce the next block, whilst the pool has not yet marked the newly added transaction executable. This will cause the block to not contain the transaction. This is an issue because we want determinism from the simulator: add a tx, mine a block. It should be in there.

The PR fixes it by adding a Sync function to the txpool, which waits for the current reset operation (if any) to finish, and then runs an entire round of reset on top. The new round is needed because resets are only triggered by new head events, so newly added transactions will not trigger the outer resets that we can wait on. The transaction pool would eventually internally do a reset even on transaction addition, but there's no easy way to wait on that and there's no meaningful reason to bubble that across everything. A clean outer reset will at worse be a small noop goroutine.
This change moves all the transaction indexing functions to a separate txindexer.go file and defines a txIndexer structure as a refactoring.
GetPayloadVX should only return payloads which match its version. GetPayloadV2 is a special snowflake that supports v1 and v2 payloads. This change uses a a version-specific prefix within in the payload id, basically a namespace for the version number.
Fix flaky test due to incomplete transaction indexing
@roman-khimov
Copy link
Contributor

I'd go for 1.13.x for now.

…1dc6504ed32a9161e71351

Merge Geth master updates starting from 766272f
ending by 99dc3fe (v1.13.11 stable release).

The list of PRs that are important for us:
    - ethereum/go-ethereum#28147 removal of rollback mechanism in downloader. We may still need this mechanism since we're pre-merge and allow forks;
    - ethereum/go-ethereum@3dc45a3 related to release pipeline, example of pre-release version update;
    - ethereum/go-ethereum@dc34fe8 related to release pipeline, example of post-release version update;
    - ethereum/go-ethereum#28098 BLOBBASEFEE opcode implemented as a part of Cancun (we have to support it eventually);
    - ethereum/go-ethereum#28195 introduce new Cancun-related block fields to eth tools;
    - ethereum/go-ethereum#28243 introduce blob transactions support for internal eth services;
    - ethereum/go-ethereum#28084 allows to invoke contract at specific block hash, may be useful for Governance contract integration;
    - ethereum/go-ethereum#28538 Dockerfile.alltools update example, just to remember that it should be in sync with Dockerfile;
    - ethereum/go-ethereum#28605 Improved Cancun- and Shanghai- related consensus verification. Although we must enable Shanghai and Cancun in dBFT, this PR is a hint of what should be changed wrt Clique/Ethash implementation for this;
    - ethereum/go-ethereum#28549 GitHub actions are added for tests;
    - ethereum/go-ethereum#27766 Beacon Committee chain implementation (light chain that is able to verify signed blocks once synced);

Signed-off-by: Anna Shaleva <[email protected]>
Provide a pointer to block hash or number. Follow the notion of
ethereum/go-ethereum#28165.

Signed-off-by: Anna Shaleva <[email protected]>
@AnnaShaleva AnnaShaleva force-pushed the merge-16ce7bf50fa71c907d1dc6504ed32a9161e71351 branch from ea4a9db to aca86f2 Compare February 8, 2024 08:00
@AnnaShaleva AnnaShaleva changed the title Sync with original Geth master Sync bane-main with original Geth master (v1.13.11 stable) Feb 8, 2024
@AnnaShaleva
Copy link
Member Author

@roman-khimov, privnet is checked, the code is fetched up to v1.13.11 stable. Ready for review.

@roman-khimov roman-khimov merged commit e6274c4 into bane-main Feb 8, 2024
@roman-khimov roman-khimov deleted the merge-16ce7bf50fa71c907d1dc6504ed32a9161e71351 branch February 8, 2024 12:02
AnnaShaleva added a commit that referenced this pull request Feb 20, 2024
**Problem**

Chain skeleton sync (batched headers sync) is impossible with
the current block size. Once the new stale peer is trying to sync its
headers, the following errors occurs during skeleton filling (some useless
logs are omitted):
```
DEBUG[02-20|14:00:46.499] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=1  fromnum=4       skip=0  reverse=false
DEBUG[02-20|14:00:46.908] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=1  fromnum=2       skip=0  reverse=false
DEBUG[02-20|14:00:47.153] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=1  fromnum=1       skip=0  reverse=false
DEBUG[02-20|14:00:47.426] Found common ancestor                    peer=8205ce1a number=0       hash=000000..000000
DEBUG[02-20|14:00:47.426] Directing header downloads               peer=8205ce1a origin=1
TRACE[02-20|14:00:47.427] Fetching skeleton headers                peer=8205ce1a count=192 from=1
DEBUG[02-20|14:00:47.427] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=128 fromnum=192     skip=191 reverse=false
DEBUG[02-20|14:00:47.427] Downloading receipts                     origin=1
DEBUG[02-20|14:00:47.427] Downloading block bodies                 origin=1
DEBUG[02-20|14:00:48.255] Filling up skeleton                      from=1
TRACE[02-20|14:00:48.256] Requesting new batch of headers          peer=8205ce1a from=1
DEBUG[02-20|14:00:48.256] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=192 fromnum=1       skip=0   reverse=false
...
TRACE[02-20|14:00:48.657] Skeleton filling not accepted            peer=8205ce1a7d69b885 from=1
DEBUG[02-20|14:00:48.657] Failed to deliver retrieved headers      peer=8205ce1a         err="delivery not accepted"
TRACE[02-20|14:00:48.657] Requesting new batch of headers          peer=8205ce1a         from=193
DEBUG[02-20|14:00:48.657] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=192 fromnum=193     skip=0   reverse=false
...
```

So there's a "delivery not accepted" error on attempt to build the skeleton
from received batch of received headers. Extended logs give us that every batch
of blocks contain only one header (the last from requested batch):
```
TRACE[02-20|14:00:48.656] Delivering headers                       accepted=false len(headers)=1 headers[0].Hash=5bb77b..06997d headers[0].Number=192 MaxHeaderFetch=192
```
So based on this information it's impossible to build the skeleton

**Explaination**

The problem is traced down to the node's peer. During headers retrieval
peer fetches the last block of the batch from its cache and tries to
retrieve the rest of them from ancients. Given the old maximum RLP header
size constraint of 700 bytes per header, the peer can't retrieve the whole
set of requested headers (MaxHeaderFetch, 192 headers each at least of
1006 bytes). Since the number of retrieved headers doesn't match the
requested one, chain accessor returns only the first header that was
retrieved from cache:
```
	// read remaining from ancients
	max := count * 700
	data, err := db.AncientRange(ChainFreezerHeaderTable, i+1-count, count, max)
	if err == nil && uint64(len(data)) == count {
		// the data is on the order [h, h+1, .., n] -- reordering needed
		for i := range data {
			rlpHeaders = append(rlpHeaders, data[len(data)-1-i])
		}
	}
	return rlpHeaders
```

**Solution**

Since `ReadHeaderRange` function specifies that the caller should limit the
number of requested headers to prevent DoS attack, it's safe to remove the maximum
bytes constraint.

**Additional notes**

After the fix I found that almost the same code change is present in the original
Geth commit ported to NeoX node implementation in #130:
447945e.
And thus, the problem should not be reproducible on the post-sync NeoX node.

Signed-off-by: Anna Shaleva <[email protected]>
AnnaShaleva added a commit that referenced this pull request Feb 20, 2024
**Problem**

Chain skeleton sync (batched headers sync) is impossible with
the current block size. Once the new stale peer is trying to sync its
headers, the following errors occurs during skeleton filling (some useless
logs are omitted):
```
DEBUG[02-20|14:00:46.499] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=1  fromnum=4       skip=0  reverse=false
DEBUG[02-20|14:00:46.908] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=1  fromnum=2       skip=0  reverse=false
DEBUG[02-20|14:00:47.153] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=1  fromnum=1       skip=0  reverse=false
DEBUG[02-20|14:00:47.426] Found common ancestor                    peer=8205ce1a number=0       hash=000000..000000
DEBUG[02-20|14:00:47.426] Directing header downloads               peer=8205ce1a origin=1
TRACE[02-20|14:00:47.427] Fetching skeleton headers                peer=8205ce1a count=192 from=1
DEBUG[02-20|14:00:47.427] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=128 fromnum=192     skip=191 reverse=false
DEBUG[02-20|14:00:47.427] Downloading receipts                     origin=1
DEBUG[02-20|14:00:47.427] Downloading block bodies                 origin=1
DEBUG[02-20|14:00:48.255] Filling up skeleton                      from=1
TRACE[02-20|14:00:48.256] Requesting new batch of headers          peer=8205ce1a from=1
DEBUG[02-20|14:00:48.256] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=192 fromnum=1       skip=0   reverse=false
...
TRACE[02-20|14:00:48.657] Skeleton filling not accepted            peer=8205ce1a7d69b885 from=1
DEBUG[02-20|14:00:48.657] Failed to deliver retrieved headers      peer=8205ce1a         err="delivery not accepted"
TRACE[02-20|14:00:48.657] Requesting new batch of headers          peer=8205ce1a         from=193
DEBUG[02-20|14:00:48.657] Fetching batch of headers                id=8205ce1a7d69b885 conn=staticdial      count=192 fromnum=193     skip=0   reverse=false
...
```

So there's a "delivery not accepted" error on attempt to build the skeleton
from received batch of received headers. Extended logs give us that every batch
of blocks contain only one header (the last from requested batch):
```
TRACE[02-20|14:00:48.656] Delivering headers                       accepted=false len(headers)=1 headers[0].Hash=5bb77b..06997d headers[0].Number=192 MaxHeaderFetch=192
```
So based on this information it's impossible to build the skeleton

**Explaination**

The problem is traced down to the node's peer. During headers retrieval
peer fetches the last block of the batch from its cache and tries to
retrieve the rest of them from ancients. Given the old maximum RLP header
size constraint of 700 bytes per header, the peer can't retrieve the whole
set of requested headers (MaxHeaderFetch, 192 headers each at least of
1006 bytes). Since the number of retrieved headers doesn't match the
requested one, chain accessor returns only the first header that was
retrieved from cache:
```
	// read remaining from ancients
	max := count * 700
	data, err := db.AncientRange(ChainFreezerHeaderTable, i+1-count, count, max)
	if err == nil && uint64(len(data)) == count {
		// the data is on the order [h, h+1, .., n] -- reordering needed
		for i := range data {
			rlpHeaders = append(rlpHeaders, data[len(data)-1-i])
		}
	}
	return rlpHeaders
```

**Solution**

Since `ReadHeaderRange` function specifies that the caller should limit the
number of requested headers to prevent DoS attack, it's safe to remove the maximum
bytes constraint.

**Additional notes**

After the fix I found that almost the same code change is present in the original
Geth commit ported to NeoX node implementation in #130:
447945e.
And thus, the problem should not be reproducible on the post-sync NeoX node.

Signed-off-by: Anna Shaleva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sync with Geth source code