-
Notifications
You must be signed in to change notification settings - Fork 187
feat(benchmark): add benchmark_test test type
#1945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
641036c to
af00ec2
Compare
af00ec2 to
de7f485
Compare
|
There are some issue in generating the fixture. I compare to the newly created fixture, and the size is much larger than the original one. This should not happen and there should be the same content, so the same size. But this is not a big problem now. The major issue now is to resolve the failing test in CI, which I could not reproduce now locally. |
|
This can come in handy for benchmark tests as basically they force the consumption of all the gas available. And that condition forces us to implement padding techniques to consume EXACTLY all the gas available in a block. When in reality, for a benchmark, we don't care about this at all. |
|
@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it |
But you can emit a warning if needed. Why does it need to be a failure not spending ALL the gas exactly? I agree it has to be within a bound. Sure. But to the unit in precision is really different. Specially when you have to account for mem expansion and other costs. It's almost impossible to not need padding. I'm not advocating to remove this completely. But to relax it maybe. Or at least, it would be useful to know why does it need to fail specifically? When and Why was this introduced? |
|
@CPerezz Thank you for explanation, it is very clear! I will review the features included again and discuss with the team. As you see this is still a draft and we welcome any feedback, we also want to know what does stateless client team need for benchmarking, what's your consideration when benchmarking? |
|
@LouisTsai-Csie So I'm just speaking in regards of "State bottlenecks" project. Which is within the stateless-consensus team. Our goal is to measure how different client impls behave when under heavy load and different state sizes among other things. For that, we need these kind of benchmarks. But it results quite tricky to match perfectly the gas spent. And it's not required at all to be spent. 1% of wiggle room is enough to consider the benchmark useful even if it doesn't spend all the gas of the block. |
de7f485 to
688e861
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After going through the current implementation and thinking about it I think this PR is mostly on the right track.
My suggestions would be:
- We have a single new spec
benchmark_teststhat receivessetup_txsandworkload_txs, or agenerator. - We have multiple generator subclasses all of which subclass
BenchmarkCodeGeneratorand an implementgenerate_setup_txsandgenerate_workload_txs(and perhapsdeploy_contracts). - Internally
benchmark_teststakessetup_txs(or callsgenerator.generate_setup_txs()) and, if any, generates a first setup block, and then takesworkload_txs(or callsgenerator.generate_workload_txs()) and puts them in the a different block.
f99f318 to
30b4f76
Compare
|
I refactor the helper function and add the context manager feature. During the update, some question and todo came to my mind:
|
|
Regarding the questions you have:
Maybe we could move the abstract class
Sgtm, I'm still open to be convinced that we indeed need it.
That might be out of scope for this PR and we should leave that for the PR that touches the |
marioevz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good! I think the code generators are fantastic, and the only part I feel we should take out and move into another PR is the BenchmarkManager.
e803c98 to
1da86fa
Compare
|
Unsure if this is somehow related. But JIC mentioning it here. In #2090 we arrived to the following conclusion:
For that reason, we realized that Therefore, the way I found to profit off of this dual mode is to allow fix-mode to take care of the pre-state deployment/generation (making sure it doesn't run in case it identifies that the state is about to deploy already is). LMK what you think @LouisTsai-Csie @fselmo . If this approach doesn't make sense. Could you let me know what;'s the best way to bring all Bloatnet benchmarks into EEST? |
fselmo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly don't have a lot to add here, this looks amazing 🔥. Really elegant approach. I added a lot of words (that's just my nature 😆) but there's really just some minor miscalculations that we should have sanity checks for anyhow. Otherwise this is looking excellent!
Major question I have is whether this will all still work if we rip out the phase manager and leave it for another PR. I think we can... is there a reason to keep it?
1da86fa to
50ec823
Compare
96b5b42 to
a0e1038
Compare
|
@LouisTsai-Csie this looks good and I've approved and marked all comments as resolved. Please add a CHANGELOG entry for this before merging. We should add some documentation for this as well. Is this important to get in quick (defer docs to another followup PR) or should we do this here before merge? |
fselmo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving again 🙂. See my previous comment about documentation. Are you willing to add it here or should we merge this and address separately? We should also add a CHANGELOG entry here.
a0e1038 to
e2f462b
Compare
* feat: wrap blockchain test for benchmark * feat: wrap state test for benchmark * feat(benchmark): add code generator to generate transaction * fix: resolve typing issue * refactor: update benchmark code generator and test wrapper * fix: udpate example changes * refactor: resolve typing and update func interface * refactor: remove benchmark state test wrapper * fix: pydantic model validation for benchmark manager * refactor synatx and parameter * refactor: remove benchmark manager feature * refactor: update logic and add benchmark tests * refactor: enforce single property requirement in blockchain test generation * refactor: update Bytecode serialization schema to use format_ser_schema * refactor: update import paths * refactor: update serialization schema * refactor: remove unused parameters * doc: add changelog entry * fix typo
🗒️ Description
As EIP-7825 is introduced in Fusaka upgrade, most of the legacy test case would fail. This issue add two test wrappers,
benchmark_testandbenchmark_state_test, to replace pureblockchain_testandstate_testtest type.🔗 Related Issues or PRs
Issue #1896
✅ Checklist
toxchecks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlinttype(scope):.