[8.19] (backport #10240) Use a random port for otel collector monitoring endpoint #10520

mergify · 2025-10-13T17:08:44Z

What does this PR do?

Uses a random port for the otel collector's Prometheus endpoint. Right now this is hardcoded at the default.

Normally, this port needs to be known during config generation (so we can add the right self-monitoring configuration), so it needs to be determined before the coordinator starts. In this PR, we instead put it into an environment variable and use the otel collector's ability to load configuration values from the variable. As a result, we can defer determining the actual port as late as possible, and even use a different port on each configuration reload, allowing us to recover from port binding conflicts.

This becomes a bit awkward with the embedded collector, where we need to call SetEnv, always an anti-pattern. But we eventually expect to retire that mode of execution, and it's not the default anymore, so it should be fine.

In the process, I'm also allowing both the metrics port and the healthcheck port to be passed into the otel manager as parameters. This is preparation for making them configurable in a follow-up.

Why is it important?

The port shouldn't be hardcoded. In the event of a conflict, the otel collector can't start.

Checklist

I have read and understood the pull request guidelines of this project.
My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
~~[ ] I have added an entry in ./changelog/fragments using the changelog tool~~
~~[ ] I have added an integration test or an E2E test~~

How to test this PR locally

Build the agent locally, run it with otel self-monitoring, generate diagnostics, then check the configuration.

Related issues

Closes https://github.com/elastic/ingest-dev/issues/6175

This is an automatic backport of pull request #10240 done by [Mergify](https://mergify.com).

* Use a random port for otel collector monitoring endpoint * mage notice * Use an env variable * Fix linter warnings * Fix random port determination for the embedded otel collector * Drop the ports functions from the utils package * fixup! Fix random port determination for the embedded otel collector * Ensure no port conflicts * Clean up port assignment * Verify that returned ports are unique * Add port conflict test * Fix docstring typo * More comments * Add comments explaining the port conflict test * Update internal/pkg/otel/manager/execution_subprocess.go Co-authored-by: Blake Rouse <[email protected]> --------- Co-authored-by: Blake Rouse <[email protected]> (cherry picked from commit 5cb8c31) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # internal/pkg/otel/manager/execution_embedded.go # internal/pkg/otel/manager/manager.go # internal/pkg/otel/manager/manager_test.go

mergify · 2025-10-13T17:08:47Z

Cherry-pick of 5cb8c31 has failed:

On branch mergify/bp/8.19/pr-10240
Your branch is up to date with 'origin/8.19'.

You are currently cherry-picking commit 5cb8c31da.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   internal/pkg/agent/application/application.go
	modified:   internal/pkg/agent/application/monitoring/component/v1_monitor.go
	modified:   internal/pkg/agent/application/monitoring/component/v1_monitor_test.go
	modified:   internal/pkg/agent/cmd/inspect.go
	modified:   internal/pkg/otel/manager/common.go
	modified:   internal/pkg/otel/manager/common_test.go
	modified:   internal/pkg/otel/manager/execution_subprocess.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   NOTICE-fips.txt
	both modified:   NOTICE.txt
	both modified:   go.mod
	both modified:   internal/pkg/otel/manager/execution_embedded.go
	both modified:   internal/pkg/otel/manager/manager.go
	both modified:   internal/pkg/otel/manager/manager_test.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

elasticmachine · 2025-10-13T17:09:00Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

elasticmachine · 2025-10-13T20:32:21Z

⏳ Build in-progress, with failures

Buildkite Build
Commit: 775a9ad

Failed CI Steps

History

💔 Build #28573 failed ef1e82b

cc @swiatekm

pierrehilbert · 2025-10-14T06:38:00Z

Unrelated failure, I'm forcing the merge.

mergify bot requested a review from a team as a code owner October 13, 2025 17:08

mergify bot added backport conflicts There is a conflict in the backported pull request labels Oct 13, 2025

mergify bot requested review from swiatekm and ycombinator and removed request for a team October 13, 2025 17:08

mergify bot added conflicts There is a conflict in the backported pull request backport labels Oct 13, 2025

mergify bot assigned swiatekm Oct 13, 2025

github-actions bot added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog labels Oct 13, 2025

Fix conflicts

775a9ad

swiatekm approved these changes Oct 13, 2025

View reviewed changes

swiatekm enabled auto-merge (squash) October 13, 2025 19:13

pierrehilbert disabled auto-merge October 14, 2025 06:38

pierrehilbert merged commit 3a8e008 into 8.19 Oct 14, 2025
16 of 17 checks passed

pierrehilbert deleted the mergify/bp/8.19/pr-10240 branch October 14, 2025 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8.19] (backport #10240) Use a random port for otel collector monitoring endpoint #10520

[8.19] (backport #10240) Use a random port for otel collector monitoring endpoint #10520

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

elasticmachine commented Oct 13, 2025

Uh oh!

elasticmachine commented Oct 13, 2025

Uh oh!

pierrehilbert commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[8.19] (backport #10240) Use a random port for otel collector monitoring endpoint #10520

[8.19] (backport #10240) Use a random port for otel collector monitoring endpoint #10520

Uh oh!

Conversation

mergify bot commented Oct 13, 2025

What does this PR do?

Why is it important?

Checklist

How to test this PR locally

Related issues

Uh oh!

mergify bot commented Oct 13, 2025

Uh oh!

elasticmachine commented Oct 13, 2025

Uh oh!

elasticmachine commented Oct 13, 2025

⏳ Build in-progress, with failures

Failed CI Steps

History

Uh oh!

pierrehilbert commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants