-
Notifications
You must be signed in to change notification settings - Fork 277
fix: show actual event kinds in Queue mode #706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8c48270
1d949ef
d1fe29d
2aaf71b
de453e1
38a9d7a
7e74679
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -196,13 +196,13 @@ func main() { | |
|
|
||
| for _, fn := range monitoringFns { | ||
| go func(monitor monitor.Monitor) { | ||
| log.Info().Str("event_type", monitor.Kind()).Msg("Started monitoring for events") | ||
| log.Info().Str("monitor_type", monitor.Kind()).Msg("Started monitoring for events") | ||
| var previousErr error | ||
| var duplicateErrCount int | ||
| for range time.Tick(time.Second * 2) { | ||
| err := monitor.Monitor() | ||
| if err != nil { | ||
| log.Warn().Str("event_type", monitor.Kind()).Err(err).Msg("There was a problem monitoring for events") | ||
| log.Warn().Str("monitor_type", monitor.Kind()).Err(err).Msg("There was a problem monitoring for events") | ||
| metrics.ErrorEventsInc(monitor.Kind()) | ||
| recorder.Emit(nthConfig.NodeName, observability.Warning, observability.MonitorErrReason, observability.MonitorErrMsgFmt, monitor.Kind()) | ||
| if previousErr != nil && err.Error() == previousErr.Error() { | ||
|
|
@@ -242,6 +242,7 @@ func main() { | |
| log.Info(). | ||
| Str("event-id", event.EventID). | ||
| Str("kind", event.Kind). | ||
| Str("monitor", event.Monitor). | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd just call this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. Let's go with |
||
| Str("node-name", event.NodeName). | ||
| Str("instance-id", event.InstanceID). | ||
| Str("provider-id", event.ProviderID). | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. On line 252 when we emit the Kubernetes event, we need to keep the "reason" argument the same as it was before ( However, let's add another argument at the end, after There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do, but I have to respectfully say I'm against this one. In all other cases the old and wrong field, data, log line or whatever can be kept and coexist with a new and fixed one. In this particular case like you pointed out there's only one reason, so either we use the right or the wrong reason. Kubernetes event reasons are the cornerstone of event-based Kubernetes monitoring, and we're choosing to leave it wrong. I understand backwards compatibility but in this case here we're not talking about an enhancement or a new feature but a fix, and I think not fixing things that are known to be wrong is not the best way to go. That said, I think the best option is to leave the or maybe WDYT? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you also intending to keep the return values of I like your first suggestion ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, if we are to keep the old and wrong reason there is no other option than reverting the changes in |
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,8 +34,8 @@ import ( | |
| ) | ||
|
|
||
| const ( | ||
| // SQSTerminateKind is a const to define an SQS termination kind of interruption event | ||
| SQSTerminateKind = "SQS_TERMINATE" | ||
| // SQSMonitorKind is a const to define this monitor kind | ||
| SQSMonitorKind = "SQS_MONITOR" | ||
| // ASGTagName is the name of the instance tag whose value is the AutoScaling group name | ||
| ASGTagName = "aws:autoscaling:groupName" | ||
| ) | ||
|
|
@@ -71,9 +71,9 @@ func (s skip) Unwrap() error { | |
| return s.err | ||
| } | ||
|
|
||
| // Kind denotes the kind of event that is processed | ||
| // Kind denotes the kind of monitor | ||
| func (m SQSMonitor) Kind() string { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm opening to renaming this method to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, sounds good. Will do. |
||
| return SQSTerminateKind | ||
| return SQSMonitorKind | ||
| } | ||
|
|
||
| // Monitor continuously monitors SQS for events and coordinates processing of the events | ||
|
|
@@ -219,9 +219,9 @@ func (m SQSMonitor) processInterruptionEvents(interruptionEventWrappers []Interr | |
| log.Debug().Str("instance-id", eventWrapper.InterruptionEvent.InstanceID).Msg("dropping interruption event for unmanaged node") | ||
| dropMessageSuggestionCount++ | ||
|
|
||
| case eventWrapper.InterruptionEvent.Kind == SQSTerminateKind: | ||
| // Successfully processed SQS message into a SQSTerminateKind interruption event | ||
| log.Debug().Msgf("Sending %s interruption event to the interruption channel", SQSTerminateKind) | ||
| case eventWrapper.InterruptionEvent.Monitor == SQSMonitorKind: | ||
| // Successfully processed SQS message into a eventWrapper.InterruptionEvent.Kind interruption event | ||
| log.Debug().Msgf("Sending %s interruption event to the interruption channel", eventWrapper.InterruptionEvent.Kind) | ||
| m.InterruptionChan <- *eventWrapper.InterruptionEvent | ||
|
|
||
| default: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the original field name
event_typeto preserve backwards compatibility, even though that's technically not correct. I'm open to adding another field namedmonitor_typeas well, to make the meaning more clear, but we need to keep the existing field name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you want to keep the
event_typefield but change its value? Old values here were consts likeRebalanceRecommendationKind = "REBALANCE_RECOMMENDATION"while new ones are consts likeRebalanceRecommendationMonitorKind = "REBALANCE_RECOMMENDATION_MONITOR". With that we are keeping the field name but changing its value so we're effectively introducing breaking changes anyway when it comes to people parsing such fields (I guess they are also parsing the field value, not just the mere presence of the field with any value).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I meant that we should keep the original value, too. So
event_typeshould beREBALANCE_RECOMMENDATION,SCHEDULED_EVENT,SPOT_ITN, orSQS_TERMINATE, rather than the newer*_MONITORvalues returned byMonitorKind().