Skip to content

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Oct 13, 2025

What does this PR do?

Uses a random port for the otel collector's Prometheus endpoint. Right now this is hardcoded at the default.

Normally, this port needs to be known during config generation (so we can add the right self-monitoring configuration), so it needs to be determined before the coordinator starts. In this PR, we instead put it into an environment variable and use the otel collector's ability to load configuration values from the variable. As a result, we can defer determining the actual port as late as possible, and even use a different port on each configuration reload, allowing us to recover from port binding conflicts.

This becomes a bit awkward with the embedded collector, where we need to call SetEnv, always an anti-pattern. But we eventually expect to retire that mode of execution, and it's not the default anymore, so it should be fine.

In the process, I'm also allowing both the metrics port and the healthcheck port to be passed into the otel manager as parameters. This is preparation for making them configurable in a follow-up.

Why is it important?

The port shouldn't be hardcoded. In the event of a conflict, the otel collector can't start.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

How to test this PR locally

Build the agent locally, run it with otel self-monitoring, generate diagnostics, then check the configuration.

Related issues


This is an automatic backport of pull request #10240 done by [Mergify](https://mergify.com).

* Use a random port for otel collector monitoring endpoint

* mage notice

* Use an env variable

* Fix linter warnings

* Fix random port determination for the embedded otel collector

* Drop the ports functions from the utils package

* fixup! Fix random port determination for the embedded otel collector

* Ensure no port conflicts

* Clean up port assignment

* Verify that returned ports are unique

* Add port conflict test

* Fix docstring typo

* More comments

* Add comments explaining the port conflict test

* Update internal/pkg/otel/manager/execution_subprocess.go

Co-authored-by: Blake Rouse <[email protected]>

---------

Co-authored-by: Blake Rouse <[email protected]>
(cherry picked from commit 5cb8c31)

# Conflicts:
#	NOTICE-fips.txt
#	NOTICE.txt
#	go.mod
#	internal/pkg/otel/manager/execution_embedded.go
#	internal/pkg/otel/manager/manager.go
#	internal/pkg/otel/manager/manager_test.go
@mergify mergify bot requested a review from a team as a code owner October 13, 2025 17:08
@mergify mergify bot added backport conflicts There is a conflict in the backported pull request labels Oct 13, 2025
@mergify mergify bot requested review from swiatekm and ycombinator and removed request for a team October 13, 2025 17:08
@mergify mergify bot added conflicts There is a conflict in the backported pull request backport labels Oct 13, 2025
Copy link
Contributor Author

mergify bot commented Oct 13, 2025

Cherry-pick of 5cb8c31 has failed:

On branch mergify/bp/8.19/pr-10240
Your branch is up to date with 'origin/8.19'.

You are currently cherry-picking commit 5cb8c31da.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   internal/pkg/agent/application/application.go
	modified:   internal/pkg/agent/application/monitoring/component/v1_monitor.go
	modified:   internal/pkg/agent/application/monitoring/component/v1_monitor_test.go
	modified:   internal/pkg/agent/cmd/inspect.go
	modified:   internal/pkg/otel/manager/common.go
	modified:   internal/pkg/otel/manager/common_test.go
	modified:   internal/pkg/otel/manager/execution_subprocess.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   NOTICE-fips.txt
	both modified:   NOTICE.txt
	both modified:   go.mod
	both modified:   internal/pkg/otel/manager/execution_embedded.go
	both modified:   internal/pkg/otel/manager/manager.go
	both modified:   internal/pkg/otel/manager/manager_test.go

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@github-actions github-actions bot added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog labels Oct 13, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@swiatekm swiatekm enabled auto-merge (squash) October 13, 2025 19:13
@elasticmachine
Copy link
Collaborator

⏳ Build in-progress, with failures

Failed CI Steps

History

cc @swiatekm

@pierrehilbert
Copy link
Contributor

Unrelated failure, I'm forcing the merge.

@pierrehilbert pierrehilbert merged commit 3a8e008 into 8.19 Oct 14, 2025
16 of 17 checks passed
@pierrehilbert pierrehilbert deleted the mergify/bp/8.19/pr-10240 branch October 14, 2025 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport bug Something isn't working conflicts There is a conflict in the backported pull request skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants