- 
                Notifications
    You must be signed in to change notification settings 
- Fork 274
use Kurtosis instead of the run script #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @aaronbuchwald Hey! I took over Tedi's project to replace the script with Kurtosis as he stopped working part time a few days ago. | 
        
          
                tests/utils/node_launcher.go
              
                Outdated
          
        
      | firstNodeId = "node-0" | ||
| ) | ||
|  | ||
| func SpinupAvalancheNode() (string, func(), error) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would it take to change this from SpinupAvalancheNode to spin up a network of N nodes and how would you specify configs of each individual node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run multiple nodes; we'll have to change the args to include node_count like
forceExposeOn9650        = `{"test_mode": true, "node_count": 5}`
This would run 5 different nodes. Note Kurtosis by default tries to expose ephemeral ports but we're using the "test_mode": true to force public ports to be a certain number(first node rpc gets 9650, second gets 9652 and so forth). Ideally we remove the test_mode and pass around the ephemeral port generated inside these tests.
Re config - At the moment all nodes start with the same config except for bootstrap information. The second bootstraps from the first one; the third from the first two. What other config would you like to tweak? It shouldn't be too hard to set it; but I wanted to get something minimal out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool that makes sense. AvalancheGo requires uses two ports the HTTP port (https://docs.avax.network/nodes/maintain/avalanchego-config-flags#--http-port-int) and the Staking port (https://docs.avax.network/nodes/maintain/avalanchego-config-flags#--staking-port-int).
If it's easier for Kurtosis to use ephemeral ports, you may just want to assign those two ports via these flags.
As in the documentation, the staking port is where the node listens for incoming p2p connections and the http port is used to run an API server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have one container per Avalanche Node inside the package. Within the Docker network the ports are set to the defaults 9650 and 9651; Kurtosis then produces (by default) an ephemeral port on localhost. That is the port that the tests interact with; with test_mode true I am forcing Kurtosis to produce local port mappings that are expected
So 9650,9651 on the first node will map to 9650 and 9651 on localhost
For the second avalanche node they map to 9652 and 9653
.. so on and so forth!
I am using test_mode as the tests seem to hardcode
var (
	DefaultLocalNodeURI = "http://127.0.0.1:9650"
	NodeURIs            = []string{DefaultLocalNodeURI, "http://127.0.0.1:9652", "http://127.0.0.1:9654", "http://127.0.0.1:9656", "http://127.0.0.1:9658"}
)
and I didn't want to change your code base too much! There's not much extra effort for Kurtosis here!
Perhaps I can make the load test a multi node load test based on what I see here
tests/load/load_test.go
56:		rpcEndpoints := make([]string, 0, len(utils.NodeURIs))
57:		for _, uri := range []string{utils.DefaultLocalNodeURI} { // TODO: use NodeURIs instead, hack until fixing multi node in a network behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the load test slightly and now I pass utils.NodeURIs & the test seems to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reverted the multi node load test as it flaked on one run - https://github.com/h4ck3rk3y/subnet-evm/actions/runs/5093958012/jobs/9157181604
## Description: Currently Starlark run remote package, run package and run script blocking calls on the SDK have the following behavior: 1. If an error happens on the APIC side, we return `(*StarlarkRunResult == nil, error != nil)` 2. If an error happens on the Starlark side, we return `(*StarlarkRunResult != nil, error == nil)`, with either `StarlarkRunResult.[InterpretationError, ValidationErrors, ExecutionError] != nil` 3. If no errors happen on the Starlark side, we return `(*StarlarkRunResult != nil, error == nil)` with all `StarlarkRunResult.[InterpretationError, ValidationErrors, ExecutionError] == nil` This behavior is not very ergonomic, given that most people are only interested if the Starlark succeeded or not, they should be able to do this with a simple `err != nil` check, which is the Go idiomatic way of dong it. If the user wants to investigate further the interpreted instructions, the breakdown of which phase failed, etc, this will still be accessible via the `StarlarkRunResult`. ## Is this change user facing? YES <!-- If yes, please add the "user facing" label to the PR --> <!-- If yes, don't forget to include docs changes where relevant --> ## References (if applicable): <!-- Add relevant Github Issues, Discord threads, or other helpful information. --> ava-labs/subnet-evm#649
| fmt.Println(fmt.Printf("Destroying enclave with id '%v'", enclaveId)) | ||
| if err = kurtosisCtx.StopEnclave(ctx, enclaveId); err != nil { | ||
| fmt.Printf("An error occurred while stopping the enclave with id '%v'\n", enclaveId) | ||
| } | ||
| if err = kurtosisCtx.DestroyEnclave(ctx, enclaveId); err != nil { | ||
| fmt.Printf("An error occurred while cleaning up the enclave with id '%v'\n", enclaveId) | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these errors be propagated somewhere?
I think failure to teardown should be treated as unexpected behavior and cause the test to fail. If this causes tests to flake intermittently, then I'd consider it a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I am now asserting that the teardown has no errors.
| const ( | ||
| isPartitioningEnabled = false | ||
| enclaveIdPrefix = "test" | ||
| avalancheStarlarkPackage = "github.com/kurtosis-tech/avalanche-package" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this running a static version of AvalancheGo/Subnet-EVM? For CI, we would want to build a Docker image of AvalancheGo that has the VM binary present in the plugin directory and launch a node running that otherwise it seems that this will not actually be testing the latest code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added image building as a step in the CI test and I pass the built image in the avalanche package now using the :test tag via the avalanchego_image argument
|  | ||
| # This script assumes that an AvalancheGo and Subnet-EVM binaries are available in the standard location | ||
| # within the $GOPATH | ||
| # The AvalancheGo and PluginDir paths can be specified via the environment variables used in ./scripts/run.sh. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this comes with new requirements (install kurtosis cli), would it be possible to add or link instructions that are necessary to run the e2e tests with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added instructions about installing Kurtosis & restarting the engine. Let me know what you think and if there's a better spot to put this apart from this script.
Signed-off-by: Gyanendra Mishra <[email protected]>
Signed-off-by: Gyanendra Mishra <[email protected]>
Co-authored-by: aaronbuchwald <[email protected]> Signed-off-by: Gyanendra Mishra <[email protected]>
Signed-off-by: Gyanendra Mishra <[email protected]>
Signed-off-by: Gyanendra Mishra <[email protected]>
| Closing in favor of migrating to https://github.com/ava-labs/avalanchego/tree/master/tests/e2e#avalanche-e2e-test-suites | 
* add regression test * apply fix * use cmp * reorder equal params * improve readability by removing extra var (#650)
Why this should be merged
This replaces the current run script with Kurtosis which is dockerized & cleans up after itself.
How this works
Installs Kurtosis in the CI and then runs test using Kurtosis. This forces the node thats spinning up to expose ports on 9650, so that we don't have to edit the existing testsuite too much
How this was tested
This is a test; CI runs fine on my fork.
How is this documented
Not sure where we document such things