-
Notifications
You must be signed in to change notification settings - Fork 197
Use a random port for otel collector monitoring endpoint #10240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request does not have a backport label. Could you fix it @swiatekm? 🙏
|
internal/pkg/agent/application/monitoring/component/v1_monitor.go
Outdated
Show resolved
Hide resolved
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
I'm probably paranoid from the npipe permission issue, but do we have a test that starts and stops the collector multiple times in rapid succession? Since we are re-using the same port number it might be worth double checking. |
I'm going to try an approach using an environment variable, so I can inject this much later in the pipeline. Will undraft once I know if that's feasible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending resolution of #10240 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, glad the environment variable substitution worked out.
Co-authored-by: Blake Rouse <[email protected]>
dab13dd
💚 Build Succeeded
History
cc @swiatekm |
* Use a random port for otel collector monitoring endpoint * mage notice * Use an env variable * Fix linter warnings * Fix random port determination for the embedded otel collector * Drop the ports functions from the utils package * fixup! Fix random port determination for the embedded otel collector * Ensure no port conflicts * Clean up port assignment * Verify that returned ports are unique * Add port conflict test * Fix docstring typo * More comments * Add comments explaining the port conflict test * Update internal/pkg/otel/manager/execution_subprocess.go Co-authored-by: Blake Rouse <[email protected]> --------- Co-authored-by: Blake Rouse <[email protected]> (cherry picked from commit 5cb8c31)
…0390) * Use a random port for otel collector monitoring endpoint * mage notice * Use an env variable * Fix linter warnings * Fix random port determination for the embedded otel collector * Drop the ports functions from the utils package * fixup! Fix random port determination for the embedded otel collector * Ensure no port conflicts * Clean up port assignment * Verify that returned ports are unique * Add port conflict test * Fix docstring typo * More comments * Add comments explaining the port conflict test * Update internal/pkg/otel/manager/execution_subprocess.go --------- (cherry picked from commit 5cb8c31) Co-authored-by: Mikołaj Świątek <[email protected]> Co-authored-by: Blake Rouse <[email protected]>
@swiatekm there's a question about backporting this to |
* Use a random port for otel collector monitoring endpoint * mage notice * Use an env variable * Fix linter warnings * Fix random port determination for the embedded otel collector * Drop the ports functions from the utils package * fixup! Fix random port determination for the embedded otel collector * Ensure no port conflicts * Clean up port assignment * Verify that returned ports are unique * Add port conflict test * Fix docstring typo * More comments * Add comments explaining the port conflict test * Update internal/pkg/otel/manager/execution_subprocess.go Co-authored-by: Blake Rouse <[email protected]> --------- Co-authored-by: Blake Rouse <[email protected]> (cherry picked from commit 5cb8c31) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # internal/pkg/otel/manager/execution_embedded.go # internal/pkg/otel/manager/manager.go # internal/pkg/otel/manager/manager_test.go
…ing endpoint (#10520) * Use a random port for otel collector monitoring endpoint (#10240) * Use a random port for otel collector monitoring endpoint * mage notice * Use an env variable * Fix linter warnings * Fix random port determination for the embedded otel collector * Drop the ports functions from the utils package * fixup! Fix random port determination for the embedded otel collector * Ensure no port conflicts * Clean up port assignment * Verify that returned ports are unique * Add port conflict test * Fix docstring typo * More comments * Add comments explaining the port conflict test * Update internal/pkg/otel/manager/execution_subprocess.go Co-authored-by: Blake Rouse <[email protected]> --------- Co-authored-by: Blake Rouse <[email protected]> (cherry picked from commit 5cb8c31) # Conflicts: # NOTICE-fips.txt # NOTICE.txt # go.mod # internal/pkg/otel/manager/execution_embedded.go # internal/pkg/otel/manager/manager.go # internal/pkg/otel/manager/manager_test.go * Fix conflicts --------- Co-authored-by: Mikołaj Świątek <[email protected]>
What does this PR do?
Uses a random port for the otel collector's Prometheus endpoint. Right now this is hardcoded at the default.
Normally, this port needs to be known during config generation (so we can add the right self-monitoring configuration), so it needs to be determined before the coordinator starts. In this PR, we instead put it into an environment variable and use the otel collector's ability to load configuration values from the variable. As a result, we can defer determining the actual port as late as possible, and even use a different port on each configuration reload, allowing us to recover from port binding conflicts.
This becomes a bit awkward with the embedded collector, where we need to call
SetEnv
, always an anti-pattern. But we eventually expect to retire that mode of execution, and it's not the default anymore, so it should be fine.In the process, I'm also allowing both the metrics port and the healthcheck port to be passed into the otel manager as parameters. This is preparation for making them configurable in a follow-up.
Why is it important?
The port shouldn't be hardcoded. In the event of a conflict, the otel collector can't start.
Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration files[ ] I have added an entry in./changelog/fragments
using the changelog tool[ ] I have added an integration test or an E2E testHow to test this PR locally
Build the agent locally, run it with otel self-monitoring, generate diagnostics, then check the configuration.
Related issues