Skip to content

Conversation

@chrisroberts
Copy link
Member

Description

See #26961 for full details.

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad website documentation to reflect this. Refer to
    the website README for docs guidelines. Please also consider whether the
    change requires notes within the upgrade guide.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

@chrisroberts chrisroberts changed the title [scheduler] fix scheduling behavior of batch job allocs [scheduler] fix scheduling behavior of batch job allocs (1.10) Oct 27, 2025
@chrisroberts chrisroberts force-pushed the f-drain-behavior-1.10 branch from cf09234 to a8875b5 Compare October 27, 2025 15:54
@chrisroberts chrisroberts marked this pull request as ready for review October 27, 2025 16:54
@chrisroberts chrisroberts requested review from a team as code owners October 27, 2025 16:54
@chrisroberts chrisroberts requested a review from tgross October 27, 2025 16:55
@chrisroberts chrisroberts marked this pull request as draft October 27, 2025 17:34
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. We should also include the changelog entry from #26961 (verbatim, including the filename that points to that PR) so that this backport shows up in the backport changelog.

tgross
tgross previously approved these changes Oct 27, 2025
@chrisroberts chrisroberts added the backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent label Oct 27, 2025
Allocations of batch jobs have a few defined behaviors documented
which do not work as expected:

First, on node drain, the allocation is allowed to complete unless
the deadline is reached at which point the allocation is killed. The
allocation is note replaced.

Second, when using the `alloc stop` command, the allocation is
stopped and then rescheduled according to its reschedule policy.

Third, on job restart if the `-reschedule` flag is used the
allocation will be migrated and its reschedule policy will be
ignored.

This update removes the change introduced in dfa07e1 (#26025)
that forced batch job allocations into a failed state when
migrating. The reported issue it was attempting to resolve was
itself incorrect behavior. The reconciler has been adjusted
to properly handle batch job allocations as documented.
@chrisroberts chrisroberts merged commit 1df9b57 into release/1.10.x Oct 29, 2025
37 checks passed
@chrisroberts chrisroberts deleted the f-drain-behavior-1.10 branch October 29, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants