system deployments: remove old rolling eval creation #27007

tgross · 2025-10-28T20:46:37Z

When using deployments, the deployment watcher emits all the evals we expect to see to step through the deployment once each allocation becomes healthy for min_healthy_time. But we left the old rolling-update eval code in place and with the new job parsing of the update block that ends up emitting a eval immediately with a zero wait time and we don't end up respecting the min_healthy_time.

This is all driven from the deployment watcher now, so we can simply remove it from the scheduler.

Testing & Reproduction steps

Run two nodes. Run the following jobspec.

jobspec

job "httpd" {
  type = "system"

  group "web" {

    network {
      mode = "bridge"
      port "www" {
        to     = 8001
      }
    }

    update {
      max_parallel = 1
      min_healthy_time = "15s"
    }

    task "http" {

      driver = "docker"

      config {
        image   = "busybox:1"
        command = "httpd"
        args    = ["-v", "-f", "-p", "8001", "-h", "/local"]
        ports   = ["www"]
      }

      env {
        version = "8"
      }

      resources {
        cpu    = 128
        memory = 128
      }
    }
  }
}

Wait until the initial deployment is done and then destructively update (ex. incrementing task.env.version). Previously an eval triggered by rolling-update would run and you'd see both new allocs immediately. Now we wait 15s after the first alloc becomes healthy and then the new alloc is created via an eval triggered by the deploymentwatcher.

If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

When using deployments, the deployment watcher emits all the evals we expect to see to step through the deployment once each allocation becomes healthy for `min_healthy_time`. But we left the old rolling-update eval code in place and with the new job parsing of the `update` block that ends up emitting a eval immediately with a zero wait time and we don't end up respecting the `min_healthy_time`. This is all driven from the deployment watcher now, so we can simply remove it from the scheduler.

schmichael · 2025-10-29T00:20:53Z

scheduler/scheduler_system.go


-	// If the limit of placements was reached we need to create an evaluation
-	// to pickup from here after the stagger period.
-	if s.limitReached && s.nextEval == nil {


AFAICT nextEval is now always nil ... can we just remove it? I removed it on main and everything compiled, but I hit a test failure.

Likewise it appears limitReached is no longer used which may allow for some removal of code from evictAndPlace

I've removed nextEval from this scheduler but not from the setStatus function params where it's consumed, because the generic scheduler writes evals there still. I want to leave limitReached in place because I know @pkazmierczak is busy working on that area of code; we'll clean it up afterwards.

pkazmierczak

LGTM, and as mentioned, we can clean up further when #26953 lands.

tgross requested review from a team as code owners October 28, 2025 20:46

tgross requested a review from pkazmierczak October 28, 2025 20:46

tgross added theme/scheduling theme/system-scheduler type/bug labels Oct 28, 2025

schmichael reviewed Oct 29, 2025

View reviewed changes

remove nextEval field

7a53ae4

vercel bot deployed to Preview – nomad-ui October 29, 2025 13:32 View deployment

pkazmierczak approved these changes Oct 29, 2025

View reviewed changes

tgross merged commit c61bc83 into main Oct 29, 2025
39 checks passed

tgross deleted the systemdeployments-no-rolling-evals branch October 29, 2025 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

system deployments: remove old rolling eval creation #27007

system deployments: remove old rolling eval creation #27007

Uh oh!

tgross commented Oct 28, 2025

Uh oh!

schmichael Oct 29, 2025

Uh oh!

schmichael Oct 29, 2025

Uh oh!

tgross Oct 29, 2025

Uh oh!

pkazmierczak left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

system deployments: remove old rolling eval creation #27007

system deployments: remove old rolling eval creation #27007

Uh oh!

Conversation

tgross commented Oct 28, 2025

Testing & Reproduction steps

Changes to Security Controls

Uh oh!

schmichael Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

schmichael Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

tgross Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

pkazmierczak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants