-
Notifications
You must be signed in to change notification settings - Fork 197
feat: utilise continue_on_err in beatsauthextension #10343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: utilise continue_on_err in beatsauthextension #10343
Conversation
dab0b93
to
edde990
Compare
887033f
to
2f276ab
Compare
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me. In call it was decided to just merge this with the otel bump. CI is green!
We should also bump the version in beats, though. |
@swiatekm does this need to happen before this PR, or after? How these get bumped today in beats? |
You bump in opentelemetry-collector-components, then in beats, and then in agent. I think it's fine to merge this PR, but if you want to be thorough, you should also bump component versions to ones which match the otel core version. Does that make sense? |
The changes look good to me, but I'd really like to have a test verifying that the basic function of this PR works - that is, if we provide invalid TLS configuration to an elasticsearch output used by a beats receiver, the collector still starts and we get the right status. I think an integration test is the right way to do this, but I'd also accept a unit test for the otel manager. |
I can’t confidently own that bump chain right now. I’ll move this PR to Draft and keep it scoped to the minimal change, letting the existing version-bump mechanisms land first. Once those are in, I’ll rebase and flip back to Ready for Review.
Agreed. I’ll add a test for that |
This pull request is now in conflicts. Could you fix it? 🙏
|
3616b35
to
79ea9a9
Compare
79ea9a9
to
5255921
Compare
💛 Build succeeded, but was flaky
Failed CI Steps
History
|
* feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada) # Conflicts: # internal/pkg/otel/translate/otelconfig.go
* feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada)
* feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada)
* feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada) Co-authored-by: Panos Koutsovasilis <[email protected]>
* feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada) # Conflicts: # internal/pkg/otel/translate/otelconfig.go
* feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada) # Conflicts: # internal/pkg/otel/translate/otelconfig.go
…tension (#10443) * feat: utilise continue_on_err in beatsauthextension (#10343) * feat: rework elasticsearch output translation to otel config to exclude validation errors * ci: add integration test (cherry picked from commit 0c0dada) # Conflicts: # internal/pkg/otel/translate/otelconfig.go * fix: resolve conflicts --------- Co-authored-by: Panos Koutsovasilis <[email protected]>
What does this PR do?
This PR improves error handling for Elasticsearch output configurations in the Hybrid Elastic Agent by:
Moving partially configuration translation ownership: Relocates some of the Elasticsearch output translation logic from the beats library (
libbeat/otelbeat/oteltranslate/outputs/elasticsearch
) into the elastic-agent package (internal/pkg/otel/translate/output_elasticsearch.go
). In the future we should do a full transition to elastic-agent repo as this gives elastic-agent full control over the translation.Enabling graceful error handling: Adds
continue_on_error: true
to the beatsauth extension configuration ingetBeatsAuthExtensionConfig()
. This prevents the OpenTelemetry collector from exiting on startup when encountering invalid SSL configurations (e.g., missing certificate files) respective PR.Why is it important?
When an Elasticsearch output has invalid configuration (like a missing SSL certificate), the collector exits with a vague error message that doesn't identify which output caused the failure:
Benefits of this PR:
Screenshot shows the intended behavior: collector continues running and errors are properly surfaced at the exporter level.
Checklist
./changelog/fragments
using the changelog toolDisruptive User Impact
No disruptive user impact expected.
How to test this PR locally
build and install elastic-agent from this branch with the following configuration
Related issues
N/A