Skip to content

Commit 12f6941

Browse files
authored
core: limit job GC batch size to match other GC batches (#26974)
In the core scheduler we have several object types where we can delete them by ID. We batch up to 7281 UUIDs because this works out to be about 256 KiB per request, which is well below the maximum Raft log entry size we want to have. When we GC jobs we use this same constant to size the batch, but the request body is not a list of UUIDs but instead a map of namespaced job IDs to a pointer to a struct. This pushes the batch size into 746 KiB (assuming UUID-sizes job names), which is going to impact performance if GC happens during large volumes of short-lived dispatch work where users may be GC'ing jobs frequently. Limit the batch size for `JobBatchDeregisterRequest` to roughly the same size as the requests that are lists of UUIDs. Ref: https://hashicorp.atlassian.net/browse/NMD-1041
1 parent 43f02c5 commit 12f6941

File tree

2 files changed

+7
-2
lines changed

2 files changed

+7
-2
lines changed

.changelog/26974.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
```release-note:bug
2+
core: Fixed a bug where GC batch sizes for jobs resulted in excessively large Raft logs
3+
```

nomad/core_sched.go

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -216,8 +216,10 @@ OUTER:
216216

217217
// jobReap contacts the leader and issues a reap on the passed jobs
218218
func (c *CoreScheduler) jobReap(jobs []*structs.Job, leaderACL string) error {
219-
// Call to the leader to issue the reap
220-
for _, req := range c.partitionJobReap(jobs, leaderACL, structs.MaxUUIDsPerWriteRequest) {
219+
// Call to the leader to issue the reap with a batch size intended to be
220+
// similar to the GC by batches of UUIDs for evals, allocs, and nodes
221+
// (limited by structs.MaxUUIDsPerWriteRequest)
222+
for _, req := range c.partitionJobReap(jobs, leaderACL, 2048) {
221223
var resp structs.JobBatchDeregisterResponse
222224
if err := c.srv.RPC(structs.JobBatchDeregisterRPCMethod, req, &resp); err != nil {
223225
c.logger.Error("batch job reap failed", "error", err)

0 commit comments

Comments
 (0)