DO NOT MERGE: synthetic parallel execution test framework #4817

graydon · 2025-07-05T07:24:13Z

This adds a special mode you can expose to live traffic from mainnet or testnet (online using run or, more commonly, offline using catchup) to test out the new p23 parallel-execution code path for soroban phases.

The way it works is that just before running a sequential soroban phase, it:

Synthesizes a fake parallel phase using the same phase-building path used in p23 txset nomination
Runs that phase on a captured throwaway copy of the pre-state of the phase
Captures the results of that (txresults and txmetas) into some buffers

It then proceeds to run the normal sequential phase as usual, and compares the captured parallel results with the sequential results, logging any differences as errors.

Its behaviour is controlled by two environment variables:

STELLAR_TEST_PARALLEL_EXECUTION must be set to a nonzero number, which will be used as the parallelism factor for the synthesized parallel phase. So setting STELLAR_TEST_PARALLEL_EXECUTION=4 will make and run a 4-way parallel phase on 4 threads.
STELLAR_COMPARISON_TOLERANCE is an optional but recommended comma-separated list of difference types to tolerate and not report as errors. Currently I recommend running with STELLAR_COMPARISON_TOLERANCE=event_topics,fees though other options are possible (browse the code). This is necessary because there are some small observable differences between p22 and p23 executions, both arising from minor protocol changes and also from the very fact of running in parallel (eg. fees go way down).

So overall, you probably want to run something like:

$ STELLAR_COMPARISON_TOLERANCE=event_topics,fees STELLAR_TEST_PARALLEL_EXECUTION=4 \
  ./src/stellar-core --conf ~/stellar-mainnet.cfg --console catchup current/1000

To help diagnose differences, it will also write them to some organized files in the filesystem, under the directory parallel-tx-diffs. For example my version just wrote these files:

parallel-tx-diffs/ledger-58007058
parallel-tx-diffs/ledger-58007058/tx-81
parallel-tx-diffs/ledger-58007058/tx-81/tx-envelope.json
parallel-tx-diffs/ledger-58007058/tx-81/meta-parallel.json
parallel-tx-diffs/ledger-58007058/tx-81/summary.txt
parallel-tx-diffs/ledger-58007058/tx-81/meta-sequential.json
parallel-tx-diffs/ledger-58007058/tx-61
parallel-tx-diffs/ledger-58007058/tx-61/tx-envelope.json
parallel-tx-diffs/ledger-58007058/tx-61/meta-parallel.json
parallel-tx-diffs/ledger-58007058/tx-61/summary.txt
parallel-tx-diffs/ledger-58007058/tx-61/meta-sequential.json
...

src/herder/ParallelTxSetBuilder.cpp

src/ledger/LedgerManagerImpl.cpp

dmkozh

This looks sensible overall, I think the main issue is weird/incorrect fee bump handling.

graydon force-pushed the re-exec branch from 9e442f3 to f885cbe Compare July 5, 2025 08:21

dmkozh reviewed Jul 9, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

graydon force-pushed the re-exec branch 5 times, most recently from a866e06 to 7b2af24 Compare July 14, 2025 22:21

Add test code for pre-running sequential soroban phases in parallel.

0cc9eac

graydon force-pushed the re-exec branch from fb41535 to 0cc9eac Compare July 15, 2025 23:15

graydon changed the title ~~DO NOT MERGE: sketch of parallel-tx re-execution for testing.~~ DO NOT MERGE: synthetic parallel execution test framework Jul 15, 2025

graydon added 6 commits July 16, 2025 12:31

Dump diagnostics of pre-exec diffs to filesystem

5a7fdef

dump all predecessor txs on diffs and also log seq/par time diff

8345778

Add simpler parallel tx stage formation code

47271b1

also dump tx results on diffs

0367ab1

Tolerate special case of par-passes-seq-runs-out-of-gas

9c8293e

add special case for kale tractor

e5058df

graydon mentioned this pull request Jul 18, 2025

Parallel Soroban testing #4822

Open

6 tasks

graydon added 10 commits July 21, 2025 17:13

Simplify parallel-txset building code

f3b4e31

filter out any balance diffs on accounts involved in fees

6ec7ddd

broaden tolerance for failures due to fee or resource differences

6207345

Add tolerance for off-by-one TTLs fixed in stellar#1560

9b2a8b9

Add back a slightly weaker kale tractor special case

32251d3

Add key-hash-file writing for debugging TTL bumps

334c8c8

Add code to trim duplicate RoTTLBumps from meta

0b91645

fix unguarded v1() call

10cce92

make roTTLBump skipping code handle feebumps

49f6dc4

dump metas and results for all predecessors on diffs

3e3b117

graydon mentioned this pull request Jul 29, 2025

WIP: Tsan on rust #4850

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DO NOT MERGE: synthetic parallel execution test framework #4817

DO NOT MERGE: synthetic parallel execution test framework #4817

Uh oh!

graydon commented Jul 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmkozh left a comment

Uh oh!

This comment was marked as outdated.

Uh oh!

DO NOT MERGE: synthetic parallel execution test framework #4817

Are you sure you want to change the base?

DO NOT MERGE: synthetic parallel execution test framework #4817

Uh oh!

Conversation

graydon commented Jul 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmkozh left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

graydon commented Jul 5, 2025 •

edited

Loading