This plugin unrestrict the Nvidia application clock command when an exclusive job run on a node tagged with gres nvgpufreq.
The nvgpufreq plugin call the nvmlDeviceSetAPIRestriction() to restrict/unrestrict the GPU frequency clock at user level. When the application clock commands have been unrestricted a standard users can chenge the GPU frequency using the nvidia-smi tool or the NVML APIs.
The nvgpufreq plugin intercepts the prolog and epilog of each job submitted in the cluster (slurm_spank_job_prolog() and slurm_spank_job_epilog()).
In the prolog procedure, the plugin does the following checks:
- Retrieve the node info from slurmctld. If the plugin cannot contact the slurmctld the plugin terminates its execution.
- Check if the node is tagged with the gres nvgpufreq. If the node is not tagged the plugin terminates its execution.
- Retrieve the job info from slurmctld. If the plugin cannot contact the slurmctld the plugin terminates its execution.
- Check if the job requests the nvgpufreq gres. If the job does not specify the gres nvgpufreq the plugin terminates its execution.
- Check if the job run exclusive on the node. If the node can be shared among multiple jobs the plugin terminates its execution.
- The plugin call the nvmlDeviceSetAPIRestriction() to unrestrict the GPU frequency clock for regular users.
In the epilog procedure, the plugin does the following checks:
- Retrieve the node info from slurmctld. If the plugin cannot contact the slurmctld the plugin terminates its execution.
- Check if the node is tagged with the gres nvgpufreq. If the node is not tagged the plugin terminates its execution.
- Check if the node has been configured from a nvgpufreq job and restore it. After that, the plugin deletes /var/run/nvgpufreq.run and concludes the epilog procedure.
To evaluate if the plugin concludes with the configuration of the node, the users/administrators can check the existence of the file /var/run/nvgpufreq.run, which contains the information if something when wrong or the plugin correctly terminated. This file should always be removed from the plugin in the epilog procedure after the restoration of the node.
The plugin implements three types of logs:
- [SLURM-NVGPUFREQ]: for general information.
- [SLURM-NVGPUFREQ][WARN]: for warning information. This includes misconfigurations that do not affect the execution of the plugin.
- [SLURM-NVGPUFREQ][ERR]: for error information. This includes problems that terminate the execution of the plugin.
To compile the code:
- Clone this repo to a node where is deployed SLURM daemon
git clone https://gitlab.hpc.cineca.it/dcesari1/slurm-nvgpufreq.git
- Create a build directory
mkdir build-nvgpufreq
- Enter in the build directory
cd build-nvgpufreq
- Run CMAKE and specify an install directory
cmake -DCMAKE_INSTALL_PREFIX=../install-nvgpufreq ../slurm-nvgpufreq
- Run makefile to start the compilation and install the plugin
make && make install
Before to deploy the plugin must be defined a gres called nvgpufreq. The gres allows the system administrators to identify only a subset of the nodes where the plugin can be used from the users.
NodeName=... Name=nvgpufreq Count=1
Add the gres configurations to the slurm.conf:
GresTypes=nvgpufreq
PlugStackConfig=/run/slurm/conf/plugstack.conf
NodeName=... Gres=nvgpufreq:1 ...
Add the plugin configuration to the plugstack.conf:
optional /path/to/nvgpufreq.so
When a user wants to use the plugin must submit a job specify the nvgpufreq gres and the exclusivity of the job.
sbatch $SLURM_CONF --gres=nvgpufreq --exclusive $BIN
For SLURM version between 20.0 and 20.02.7 see the following: https://bugs.schedmd.com/show_bug.cgi?id=9081