Skip to content

Add a MockS3Server plus an s3client and bulkdumpings3 simulation test #12279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 29, 2025

Conversation

saintstack
Copy link
Contributor

@saintstack saintstack commented Jul 25, 2025

Register it at 127.0.0.1:8080 on the simulated network (simulated network intercepts requests for 127.0.0.1:8080 and does appropriate forwarding to s3 handler). Add two workloads, one for s3client against 's3' and another that swaps in s3 as backend for the BulkDumping test.

The Mock S3 Server is mostly generated (claude-sonnet-4) code. It supports:

  • Basic GET/PUT/DELETE/HEAD object operations
  • Multipart uploads (initiate, upload parts, complete, abort)
  • Object tagging (put/get tags)
  • In-memory storage with deterministic behavior
  • S3-compatible XML responses
  • fdbserver/workloads/BulkDumping.actor.cpp Change this workload so it reads the transport to use from configuration file so can be used to test file-based and s3 bulk loading. If the transport is blobstore/s3, start up the mocks3server in _setup.

  • fdbserver/workloads/S3ClientWorkload.actor.cpp Simple workload to exercise s3client. Starts up the mocks3server in _setup.

  • fdbclient/S3BlobStore.actor.cpp Check BUGGIFY before messing w/ status codes.

  • fdbrpc/HTTP.actor.cpp Add defines and log if failed parse of status line.


j start --tarball ~/build_output/packages/correctness-7.4.0.tar.gz  --max-run 100000  --env TH_ARCHIVE_LOGS_ON_FAILURE=true && j tail --errors
....

  20250725-194324-stack-d0bf81a817222ed4             compressed=True data_size=41362786 duration=5883808 ended=100000 env=TH_ARCHIVE_LOGS_ON_FAILURE=true fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=0:57:29 sanity=False started=100000 stopped=20250725-204053 submitted=20250725-194324 timeout=5400 username=stack

And I think I fixed the one fail ... its global buggify.

network (simulated network intercepts requests for 127.0.0.1:8080
and does appropriate forwarding to s3 handler). Add two workloads,
one for s3client against 's3' and another that swaps in s3 as
backend for the BulkDumping test.

The Mock S3 Server is mostly generated (claude-sonnet-4) code. It
supports:
- Basic GET/PUT/DELETE/HEAD object operations
- Multipart uploads (initiate, upload parts, complete, abort)
- Object tagging (put/get tags)
- In-memory storage with deterministic behavior
- S3-compatible XML responses

* fdbserver/workloads/BulkDumping.actor.cpp
 Change this workload so it reads the transport to use
 from configuration file so can be used to test
 file-based and s3 bulk loading. If the transport is
 blobstore/s3, start up the mocks3server in _setup.

* fdbserver/workloads/S3ClientWorkload.actor.cpp
 Simple workload to exercise s3client. Starts up the
 mocks3server in _setup.

* fdbclient/S3BlobStore.actor.cpp
 Check BUGGIFY before messing w/ status codes.

* fdbrpc/HTTP.actor.cpp
 Add defines and log if failed parse of status line.
@saintstack saintstack changed the title Add a MockS3Server and an s3client and bulkdumpings3 simulationn test Add a MockS3Server and an s3client and bulkdumpings3 simulation test Jul 25, 2025
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 4783393
  • Duration 0:24:56
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 4783393
  • Duration 0:40:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 4783393
  • Duration 0:48:52
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 4783393
  • Duration 1:02:24
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 4783393
  • Duration 1:07:02
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 4783393
  • Duration 1:07:59
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 4783393
  • Duration 1:08:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: e23b25e
  • Duration 0:24:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: e23b25e
  • Duration 0:39:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: e23b25e
  • Duration 0:48:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: e23b25e
  • Duration 0:57:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: e23b25e
  • Duration 1:03:33
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: e23b25e
  • Duration 1:07:43
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: e23b25e
  • Duration 1:11:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack saintstack requested review from kakaiu and jzhou77 July 28, 2025 15:24
@saintstack saintstack changed the title Add a MockS3Server and an s3client and bulkdumpings3 simulation test Add a MockS3Server plus an s3client and bulkdumpings3 simulation test Jul 28, 2025
@@ -2028,6 +2028,8 @@ int main(int argc, char* argv[]) {
enableGeneralBuggify();
} else {
disableGeneralBuggify();
// When buggify is disabled, also disable global fault injection
opts.faultInjectionEnabled = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to disable the fault injection? I think the buggify is not exactly same as the fault injection. Please correct me if I'm wrong. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here because though I have fault injection set to false in the two toml tests files, there was still fault injection happening in joshua. Let me undo this for now. I'll post new joshua run numbers... Get an example of what fault injection I was seeing.

@@ -1186,12 +1186,13 @@ ACTOR Future<Reference<HTTP::IncomingResponse>> doRequest_impl(Reference<S3BlobS
}

Reference<HTTP::IncomingResponse> _r = wait(timeoutError(reqF, requestTimeout));
if (g_network->isSimulated() && deterministicRandom()->random01() < 0.1) {
if (g_network->isSimulated() && BUGGIFY && deterministicRandom()->random01() < 0.1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do Buggify here? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only do the random if BUGGIFY is enabled. I don't to turn off any variance for the moment in s3 client behaviors till we have stable determinism.

}

// Public Interface Implementation
ACTOR Future<Void> startMockS3Server(NetworkAddress listenAddress) {
Copy link
Member

@kakaiu kakaiu Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this startMockS3Server used? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently. I thought I would need it until tripped over the register http server thingy. Seems useful though if someone wanted to start up a mock s3 instance.

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: be1d620
  • Duration 0:23:55
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: be1d620
  • Duration 0:40:36
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: be1d620
  • Duration 0:49:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: be1d620
  • Duration 1:03:03
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: be1d620
  • Duration 1:06:41
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: be1d620
  • Duration 1:17:50
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: be1d620
  • Duration 1:18:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack
Copy link
Contributor Author

Here is 10k tests of BulkDumpingS3 only

j start --tarball ~/BulkDumpingS3.tar.gz  --max-run 10000 --env TH_ARCHIVE_LOGS_ON_FAILURE=true --env TH_UNSEED_CHECK_RATIO=1.0
...
  20250728-202325-stack-06303b24331eafcc             compressed=True data_size=41406270 duration=422162 ended=10000 env=TH_ARCHIVE_LOGS_ON_FAILURE=true:TH_UNSEED_CHECK_RATIO=1.0 fail_fast=10 max_runs=10000 pass=10000 priority=100 remaining=0 runtime=0:17:15 sanity=False started=10000 stopped=20250728-204040 submitted=20250728-202325 timeout=5400 username=stack

and here are 100k of general tests

j start --tarball ~/build_output/packages/correctness-7.4.0.tar.gz  --max-run 100000  --env TH_ARCHIVE_LOGS_ON_FAILURE=true && j tail --errors
Note: Ensemble will complete after 10 failed results.
Note: Ensemble will complete after 100000 runs.
Starting ensemble
...
  20250728-204821-stack-d63f1d74060032c4             compressed=True data_size=41361024 duration=5864025 ended=100000 env=TH_ARCHIVE_LOGS_ON_FAILURE=true fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:57:58 sanity=False started=100000 stopped=20250728-214619 submitted=20250728-204821 timeout=5400 username=stack

Looks like its fine w/o the global disabling of fault injection.


[[knobs]]
# Disable failure injection for deterministic behavior
bulkload_sim_failure_injection = false
Copy link
Member

@kakaiu kakaiu Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this knob is irrelevant to this toml file test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Removed (was mirroring changes in BulkDumpingS3 to this file)

# When done, run:
# shutdown_weed ~/weed

# Disable all fault injection and network failures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to disable all fault injection? Will we enable those in the future? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Currently, fault injection produces failures. In follow-on, would introduce faults.

# DETERMINISM FIX: Combined knobs section to avoid duplicates
[[knobs]]
# S3/Bulkload determinism fixes
bulkload_sim_failure_injection = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try with enabling this knob bulkload_sim_failure_injection? Since we have a mocked http server, we do not have a reason to disable the bulkload failure injection? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I disable this flag, I get failures:

20250729-152416-stack-b70ff5b181db9da0 compressed=True data_size=41406205 duration=898424 ended=10000 env=TH_ARCHIVE_LOGS_ON_FAILURE=true:TH_UNSEED_CHECK_RATIO=1.0 fail=2 fail_fast=10 max_runs=10000 pass=9998 priority=100 remaining=0 runtime=0:22:44 sanity=False started=10000 stopped=20250729-154700 submitted=20250729-152416 timeout=5400 username=stack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We can set the knob here and get the PR merged. We will remove the bulkload_sim_failure_injection later. Maybe better to add a "TODO" here.

# Simple network configuration - single region, no satellites
minimumRegions = 1
# Use ssd-2 storage engine instead of rocksdb to avoid RocksDB bulk loading crashes
# The RocksDB assertion failure indicates incompatibility with current bulk loading implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which assertion failure? The current bulkload implementation should be compatible to RocksDB. Please let me know more detail about this. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is old at your suggestion IIRC. We can look at this in a follow-on?

// Register MockS3Server with IP address - simulation environment doesn't support hostname resolution
// See in HTTPServer.actor.cpp how the MockS3RequestHandler is implemented. Client connects to
// connect("127.0.0.1", "8080") and then simulation network routes it to MockS3Server.
wait(g_simulator->registerSimHTTPServer("127.0.0.1", "8080", makeReference<MockS3RequestHandler>()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice try!

Copy link
Member

@kakaiu kakaiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only a few nits. Thank you for the nice PR!

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 25ea3d0
  • Duration 0:25:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 25ea3d0
  • Duration 0:39:49
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 25ea3d0
  • Duration 0:48:07
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 25ea3d0
  • Duration 1:02:20
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 25ea3d0
  • Duration 1:13:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@kakaiu kakaiu self-requested a review July 29, 2025 17:55
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 25ea3d0
  • Duration 1:52:38
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 25ea3d0
  • Duration 1:56:20
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@saintstack saintstack merged commit d45a17b into apple:main Jul 29, 2025
7 checks passed
@saintstack
Copy link
Contributor Author

Thanks for the review @kakaiu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants