Skip to content

Conversation

github-actions[bot]
Copy link
Contributor

These files are used for picking the starting (pre-upgrade) or ending (post-upgrade) agent versions in upgrade integration tests.

The content is based on responses from https://www.elastic.co/api/product_versions and https://snapshots.elastic.co

The current update is generated based on the following requirements:

Package version: 9.2.0

{
  "UpgradeToVersion": "9.2.0",
  "CurrentMajors": 1,
  "PreviousMajors": 1,
  "PreviousMinors": 2,
  "SnapshotBranches": [
    "9.1",
    "9.0",
    "8.19",
    "7.17"
  ]
}

These files are used for picking the starting (pre-upgrade) or ending (post-upgrade) agent versions in upgrade integration tests.

The content is based on responses from https://www.elastic.co/api/product_versions and https://snapshots.elastic.co

The current update is generated based on the following requirements:

Package version: 9.2.0

```json
{
  "UpgradeToVersion": "9.2.0",
  "CurrentMajors": 1,
  "PreviousMajors": 1,
  "PreviousMinors": 2,
  "SnapshotBranches": [
    "9.1",
    "9.0",
    "8.19",
    "7.17"
  ]
}
```
@github-actions github-actions bot requested a review from a team as a code owner July 30, 2025 00:34
@github-actions github-actions bot requested review from michalpristas and pchila July 30, 2025 00:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

1 similar comment
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pkoutsovasilis
Copy link
Contributor

@pchila I had a look at these failures and it seems that the tests introduced by this PR #8407 are failing, please do have a look 🙂

@pchila
Copy link
Member

pchila commented Jul 30, 2025

@pchila I had a look at these failures and it seems that the tests introduced by this PR #8407 are failing, please do have a look 🙂

The version check for the rollback reason is pointing at >= 9.1.0-SNAPSHOT but the feature introduced with #8407 didn't make it in 9.1.0. Will change the version and retest.
Draft #9181

@pchila pchila mentioned this pull request Jul 30, 2025
2 tasks
Copy link

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@elasticmachine
Copy link
Collaborator

elasticmachine commented Jul 30, 2025

💔 Build Failed

Failed CI Steps

History

@pkoutsovasilis
Copy link
Contributor

@ycombinator 👋 do we have a way to debug why this fails? 🙂

@pchila
Copy link
Member

pchila commented Jul 31, 2025

@ycombinator 👋 do we have a way to debug why this fails? 🙂

it seems we have the same problem in 9.1 branch #9207

@ebeahan
Copy link
Member

ebeahan commented Jul 31, 2025

Could it have something to do with the branch not being bumped to 9.1.1 yet (#8695)? Never mind got my PRs mixed up.

cc @michel-laterman @ycombinator for the FIPS upgrade test failure.

@ycombinator
Copy link
Contributor

@ycombinator
Copy link
Contributor

CI is failing on this PR like so:

=== Failed
--
  | === FAIL: testing/integration/ess TestUpgradeIntegrationsServer/8.19.0_to_9.2.0-SNAPSHOT (848.88s)
  | upgrade_integrations_server_test.go:66: Creating ECH deployment with version [8.19.0] in region [gcp-us-west2]
  | upgrade_integrations_server_test.go:93: Waiting for ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] to be ready and healthy after creation
  | upgrade_integrations_server_test.go:98: Upgrading ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] from version [8.19.0] to [9.2.0-SNAPSHOT]
  | upgrade_integrations_server_test.go:104: Waiting for ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] to be ready and healthy after upgrade
  | upgrade_integrations_server_test.go:106:
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-aws-1753880681521799115/elastic/elastic-agent/testing/integration/ess/upgrade_integrations_server_test.go:106
  | Error:      	Received unexpected error:
  | failed to check for cloud 9.2.0-SNAPSHOT [stack_id: it-upgrade-integrations-server, deployment_id: 58a47798f1b74196afb9686c08bd5f1e] to be ready: context deadline exceeded
  | Test:       	TestUpgradeIntegrationsServer/8.19.0_to_9.2.0-SNAPSHOT
  | upgrade_integrations_server_test.go:82: Cleaning up ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] after [1m0s]
  | --- FAIL: TestUpgradeIntegrationsServer/8.19.0_to_9.2.0-SNAPSHOT (848.88s)
  |  
  | === FAIL: testing/integration/ess TestUpgradeIntegrationsServer/9.1.0_to_9.2.0-SNAPSHOT (848.96s)
  | upgrade_integrations_server_test.go:66: Creating ECH deployment with version [9.1.0] in region [gcp-us-west2]
  | upgrade_integrations_server_test.go:93: Waiting for ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] to be ready and healthy after creation
  | upgrade_integrations_server_test.go:98: Upgrading ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] from version [9.1.0] to [9.2.0-SNAPSHOT]
  | upgrade_integrations_server_test.go:104: Waiting for ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] to be ready and healthy after upgrade
  | upgrade_integrations_server_test.go:106:
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-aws-1753880681521799115/elastic/elastic-agent/testing/integration/ess/upgrade_integrations_server_test.go:106
  | Error:      	Received unexpected error:
  | failed to check for cloud 9.2.0-SNAPSHOT [stack_id: it-upgrade-integrations-server, deployment_id: cb3120e0a4574c84b5fc15b9568277e6] to be ready: context deadline exceeded
  | Test:       	TestUpgradeIntegrationsServer/9.1.0_to_9.2.0-SNAPSHOT
  | upgrade_integrations_server_test.go:82: Cleaning up ECH deployment [it-upgrade-integrations-server] in region [gcp-us-west2] after [1m0s]
  | --- FAIL: TestUpgradeIntegrationsServer/9.1.0_to_9.2.0-SNAPSHOT (848.96s)

In both cases, the error seems to be that the test is timing out waiting for the 9.2.0-SNAPSHOT deployment to be ready. It's possible there was something wrong with the stack pack for that version when the test ran, so let me try to manually create a deployment in the same region (Production ESS CFT) with the same version now and see if that succeeds first. If it does, we can retry the test.

@ycombinator
Copy link
Contributor

In both cases, the error seems to be that the test is timing out waiting for the 9.2.0-SNAPSHOT deployment to be ready. It's possible there was something wrong with the stack pack for that version when the test ran, so let me try to manually create a deployment in the same region (Production ESS CFT) with the same version now and see if that succeeds first. If it does, we can retry the test.

I was able to manually create a 9.2.0-SNAPSHOT deployment in the Production ESS CFT region just now. Retrying failed test step in CI...

@ycombinator
Copy link
Contributor

Okay, progress, sort of. CI is now failing on a different error:

=== FAIL: testing/integration/ess TestUpgradeIntegrationsServer/8.19.0_to_9.2.0-SNAPSHOT (0.67s)
--
  | upgrade_integrations_server_test.go:66: Creating ECH deployment with version [8.19.0] in region [gcp-us-west2]
  | upgrade_integrations_server_test.go:73:
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-aws-1754412855533259379/elastic/elastic-agent/testing/integration/ess/upgrade_integrations_server_test.go:73
  | Error:      	Received unexpected error:
  | failed to create ESS cloud for version 8.19.0: failed to create: (clusters.cluster_invalid_plan) Version [8.19.0] is not available. Please use one of the available versions
  | Test:       	TestUpgradeIntegrationsServer/8.19.0_to_9.2.0-SNAPSHOT
  | --- FAIL: TestUpgradeIntegrationsServer/8.19.0_to_9.2.0-SNAPSHOT (0.67s)
  |  
  | === FAIL: testing/integration/ess TestUpgradeIntegrationsServer/9.1.0_to_9.2.0-SNAPSHOT (0.85s)
  | upgrade_integrations_server_test.go:66: Creating ECH deployment with version [9.1.0] in region [gcp-us-west2]
  | upgrade_integrations_server_test.go:73:
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-aws-1754412855533259379/elastic/elastic-agent/testing/integration/ess/upgrade_integrations_server_test.go:73
  | Error:      	Received unexpected error:
  | failed to create ESS cloud for version 9.1.0: failed to create: (clusters.cluster_invalid_plan) Version [9.1.0] is not available. Please use one of the available versions
  | Test:       	TestUpgradeIntegrationsServer/9.1.0_to_9.2.0-SNAPSHOT
  | --- FAIL: TestUpgradeIntegrationsServer/9.1.0_to_9.2.0-SNAPSHOT (0.85s)

So now the issue seems to be the same as #9207 (comment). I've proposed a solution in that comment; let's continue the conversation on that PR so it's all in one place and then apply whatever we decided to do to both PRs.

@pkoutsovasilis
Copy link
Contributor

I did leave a reply here @ycombinator, and I understand this will require some tweaking on our side.

In the meantime, we have PR #9048 that we’d ideally like to merge today to trigger the update‑versions automation.

How about the following approach:

  1. Close this PR to avoid the repeated CI failures.
  2. Merge ci: build agent from snapshot DRA #9048 and invoke the automation.
  3. Temporarily run without the GitHub label that triggers these ESS upgrade tests until we implement the solution for the missing versions.

This way we can keep things unblocked while still moving towards the permanent fix.

@ycombinator
Copy link
Contributor

I did leave a reply here @ycombinator, and I understand this will require some tweaking on our side.

In the meantime, we have PR #9048 that we’d ideally like to merge today to trigger the update‑versions automation.

How about the following approach:

  1. Close this PR to avoid the repeated CI failures.
  2. Merge ci: build agent from snapshot DRA #9048 and invoke the automation.

When you say, "invoke the automation", I assume you mean the automation that will create a new version of this PR here (#9172), correct?

  1. Temporarily run without the GitHub label that triggers these ESS upgrade tests until we implement the solution for the missing versions.

This way we can keep things unblocked while still moving towards the permanent fix.

++ to this approach, pending the one question above.

@pkoutsovasilis
Copy link
Contributor

When you say, "invoke the automation", I assume you mean the automation that will create a new version of this PR here (#9172), correct?

yy I am referring to that. So I think we are aligned and we do agree

@pkoutsovasilis
Copy link
Contributor

closing this PR and it will be "re-created" shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants