-
Notifications
You must be signed in to change notification settings - Fork 4.2k
feat(nodeScaleDownTime): add a new metric to track unprocessed nodes during scaleDown #8614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(nodeScaleDownTime): add a new metric to track unprocessed nodes during scaleDown #8614
Conversation
|
|
|
Welcome @shaikenov! |
|
Hi @shaikenov. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kada2004, shaikenov The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
b9e7969 to
86a98d1
Compare
86a98d1 to
0457f73
Compare
0457f73 to
c2756ca
Compare
c2756ca to
a279f58
Compare
e5f5131 to
65b09dd
Compare
65b09dd to
716b1c3
Compare
716b1c3 to
1239ced
Compare
1239ced to
44aac42
Compare
| var longestTime time.Duration | ||
| // if nodeNames is nil it means that all nodes were processed | ||
| if nodeNames == nil { | ||
| // if l.minimumTime is 0, then in previous iteration we also processed all the nodes, so the longest time is 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we require all nodes to be processed twice before resetting the metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first time we might have some leftovers from previous simulations (as in the next comment), so we want to calculate the time for these nodes and reset the node map. If we get here again the second time we will report 0. I refactored this part to fix the issue that I had, I hope now it is more correct and clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like an implementation detail leading to surprising results. Can this be implemented in a way that doesn't require this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this condition here for a case when we do not have any unprocessed nodes at all. In this case if this if is not here and we have something like this:
func (l *LongestNodeScaleDownEvalTime) Update(nodeNames []string, currentTime time.Time) time.Duration {
minimumTime := l.getMin()
newNodes := make(map[string]time.Time)
for _, nodeName := range nodeNames {
newNodes[nodeName] = l.get(nodeName)
}
l.NodeNamesWithTimeStamps = newNodes
longestTime := currentTime.Sub(minimumTime)
l.lastEvalTime = currentTime
metrics.ObserveLongestUnneededNodeScaleDownEvalDurationSeconds(longestTime)
return longestTime
}
we will report time between simulations (currentTime - lastEvalTime).
In this case IIUC we want to report 0 because there are no unprocessed nodes and from the metric definition (longest time during which node was not processed during ScaleDown) this metric is about the skipped nodes. The snippet above is fine by me, but I think it a bit diverges from the metric definition if there are not unprocessed nodes. WDYT, @x13n ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm misunderstanding semantic of the metric, but if you want to report longest time a node was not processed I expected something like this:
func (l *LongestNodeScaleDownEvalTime) Update(nodeNames []string, currentTime time.Time) time.Duration {
newNodes := make(map[string]time.Time)
for _, nodeName := range nodeNames {
newNodes[nodeName] = l.get(nodeName)
}
l.NodeNamesWithTimeStamps = newNodes
minimumTime := l.getMin()
longestDuration := currentTime.Sub(minimumTime)
l.lastEvalTime = currentTime
metrics.ObserveLongestUnneededNodeScaleDownEvalDurationSeconds(longestDuration)
return longestDuration
}
I suppose the distinction is that when I have this called with currentTime equal to some times t0, t1 and t2, only the nodes unprocessed at a given timestamp are considered by the metric in that time instant. So, say at t0 everything was processed, at t1 node A was skipped and at t2 node B was skipped. The metric then is set to 0 at t0, t1-t0 at t1 and t2-t1 at t2. If then at t3 everything is processed again, the metric drops back to 0. Does that make sense? Or did I misunderstood the metric definition?
| wantLongestScaleDownEvalTime []time.Duration | ||
| } | ||
| start := time.Now() | ||
| testCases := []testCase{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test case when the set of skipped nodes is different every time? Wouldn't it make more sense for the metric to not increase in such scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this test case. There was a small bug with incorrect duration calculation, but now it should be fine.
In a map we store the last time when a node was not skipped. If we have different unprocessed nodes and the intervals between iterations is 1 sec, we will report the following:
- initialization
- {n1} -> getMin() returns 0 sec, we store map:{n1: 0 sec} and we return 1 sec
- {n2} -> getMin() returns 0 sec, we store map:{n2: 1 sec}, because 1 sec is the last time when n2 was not skipped and we return 2 sec
- {n3} -> ... we return 2 sec
- {n4} -> ... we return 2 sec
- {} -> we still have leftovers for node n4 in the map and getMin() returns 3 sec and we return 2 sec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly as in the other comment - this sounds like an artifact of the current implementation, rather than desired behavior. Why would we want the metric to be different for a single unprocessed node based on whether previous iteration had any unprocessed nodes or not?
|
/ok-to-test |
|
@x13n: /release-note-edit must be used with a single release note block. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Ah, looks like you removed the release note block entirely, please bring it back in the first comment. |
|
/release-note-edit |
44aac42 to
3a5135c
Compare
… nodes during scaleDown
3a5135c to
4e48044
Compare
|
|
||
| // handleUnprocessedNodes is used to track the longest time it take for a node to be evaluated as removable or not | ||
| func (p *Planner) handleUnprocessedNodes(unprocessedNodeNames []string) { | ||
| // if p.longestNodeScaleDownEvalTime is not set (flag is disabled) or endedPrematurely is already true (nodes were already reported in this iteration) do not do anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment as well.
| var longestTime time.Duration | ||
| // if nodeNames is nil it means that all nodes were processed | ||
| if nodeNames == nil { | ||
| // if l.minimumTime is 0, then in previous iteration we also processed all the nodes, so the longest time is 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like an implementation detail leading to surprising results. Can this be implemented in a way that doesn't require this?
| wantLongestScaleDownEvalTime []time.Duration | ||
| } | ||
| start := time.Now() | ||
| testCases := []testCase{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly as in the other comment - this sounds like an artifact of the current implementation, rather than desired behavior. Why would we want the metric to be different for a single unprocessed node based on whether previous iteration had any unprocessed nodes or not?
| // lastEvalTime is the time of previous currentlyUnneededNodeNames parsing | ||
| lastEvalTime time.Time | ||
| // NodeNamesWithTimeStamps is maps of nodeNames with their time of last successful evaluation | ||
| NodeNamesWithTimeStamps map[string]time.Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this public? This should be an implementation detail. Ideally tests would just evaluate whether return value from Update is correct.
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds a new metric to the exported prometheus metrics list: LongestNodeScaleDownTime
We want to track all the nodes that were marked as unneeded, but were unprocessed during the ScaleDown. If a node was unneeded, but unprocessed multiple times consecutively, we store only the earliest time it happened. The difference between the current time and the earliest time among all unprocessed nodes will give the longest time. This time can give us an indication of possible throttling and helps to better monitor what happens during ScaleDown.
Which issue(s) this PR fixes:
None
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: