-
Notifications
You must be signed in to change notification settings - Fork 167
LOG-7896: Add alert when forwarder sink is generating errors #3137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@jcantrill: This pull request references LOG-7896 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.8.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/hold |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jcantrill The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hello @jcantrill , About your questions: Question 1: does it makes sense for the threshold to be zero? Question 2: does it make sense to have one alert for each collector pod or collapse them into a single CLF definition Then, yes, it makes sense to be individual by collector pod. |
My original impl was an alert for a CLF but I modified it to be for the pods in a CLF for exactly this reason. I was thinking of the case where there is issue with a single node |
|
Hello @jcantrill , Then, for generating a lot of fake errors, we need to do it by CLF as an output is defined, we should expect that this output should receive some logs from any of the outputs |
|
I believe the way this alert triggers this satisfies what is needed: It will fire an alert which identifies the specific pod for a given CLF namespace/name which should cover all scenarios. I can imagine the tedious part would be on a cluster with a significant number of nodes and generating one alert for each; that may be the deal breaker here as it would be a significant number of alerts to dismiss. The current implementation allows identification of "the one ocp node" with problems but does not handle well all of them having issues at once. My ideal implementation would be a single alert that identifies all the pods which are exhibiting the same issue. |
241cfcf to
ad34141
Compare
|
/test functional-target |
1 similar comment
|
/test functional-target |
|
@jcantrill: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |


Description
This PR:
Links
https://issues.redhat.com/browse/LOG-7896
@xperimental @r2d2rnd