-
Notifications
You must be signed in to change notification settings - Fork 107
Add support for Feature
specification for ComputeResources
#3020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing with this change. I left a few minor comments if you could take a look.
self.disable_multithreading = compute_resource_config["DisableSimultaneousMultithreading"] | ||
self.custom_settings = compute_resource_config.get("CustomSlurmSettings", {}) | ||
# If Features are in CustomSlurmSettings, fetch them and remove from CustomSlurmSettings | ||
self.custom_features = self.custom_settings.pop("Feature", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of mutating the custom_settings object by removing features I would keep custom_settings aligned with what is injected from config and filter out features when building the custom += f" {param}={value}"
string as part of the nodename rendering.
# this is a simple workaround for empty Features: split will give [''] and we'll discard it | ||
# as well as all empty strings. | ||
features.discard('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default to None when reading the Feature string and skip the splitting so that you don't need to discard ''?
# remove "system" features from configured features, so that they will never interfere | ||
for feat in ('static', 'dynamic', 'gpu', 'efa') : | ||
features.discard(feat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could consider leaving these unaltered - if for any reason the user passes them in we could just honor them as overrides
Description of changes
With this changes, users will be able to specify additional
Feature
values for ComputeResources. ThoseFeature
-s will be safely added to the Slurm nodes specification for the partition.Rationale: In many cases, especially with complex cluster setups, it's crucial to have additional
Feature
-s that can be used to filter nodes for the jobs.Tests
Added the following tests:
test/unit/slurm/test_slurm_config_generator.py::test_generate_slurm_config_with_custom_features
- tests config generation with custom features specified. Corresponding test data is intest/unit/slurm/test_generate_slurm_config_with_custom_features
.NOTE: test data for many tests were changed because the new features generation algorithm requires all features in
Feature=...
in config files to be sorted. So many test files were affected, but the only change that were made to them is features ordering.References
This PR has a complimentary PR in
aws-parallelcluster
repo: aws/aws-parallelcluster#6962By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.