Releases · aws/aws-parallelcluster-cookbook

29 Jul 10:37

demartinofra

v2.4.1

f0b50ba

AWS ParallelCluster v2.4.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.4.1.

This is associated with AWS ParallelCluster v2.4.1.

Enhancements

Install IntelMPI on Alinux, Centos 7 and Ubuntu 1604
Upgrade EFA to version 1.4.1
Run all node daemons and cookbook recipes in isolated Python virtualenvs. This allows our code to always
run with the required Python dependencies and solves all conflicts and runtime failures that were being
caused by user packages installed in the system Python

Changes

Torque: upgrade to version 6.1.2
Run all node daemons with Python 3.6
Torque: changed following parameters in global configuration:
- server node_check_rate = 120 - Specifies the minimum duration (in seconds) that a node can fail to send a status update before being marked down by the pbs_server daemon. Previously was 600. This reduces scaling reaction times in case of instance failure or unexpected termination (especially with spot)
- server node_ping_rate = 60 - Specifies the maximum interval (in seconds) between successive "pings" sent from the pbs_server daemon to the pbs_mom daemon to determine node/daemon health. Previously was 300. Setting it to half the node_check_rate.
- server timeout_for_job_delete = 30 - The specific timeout used when deleting jobs because the node they are executing on is being deleted. Previously was 120. This prevents job deletion to hang for more than 30 seconds when the node they are running on is being deleted.
- server timeout_for_job_requeue = 30 - The specific timeout used when requeuing jobs because the node they are executing on is being deleted. Previously was 120. This prevents node deletion to hang for more than 30 seconds when a job cannot be rescheduled.

Bug Fixes

Restore correct value for filehandle_limit that was getting reset when setting memory_limit for EFA
Torque: fix configuration of server operators that was preventing compute nodes from disabling themselves
before termination

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

Assets 2

11 Jun 15:29

lukeseawalker

v2.4.0

e94e9c2

AWS ParallelCluster v2.4.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.4.0.

This is associated with AWS ParallelCluster v2.4.0.

Enhancements

Add support for EFA on Centos 7, Amazon Linux and Ubuntu 1604
Add support for Ubuntu in China region cn-northwest-1

Changes

SGE: changed following parameters in global configuration
- max_unheard 00:03:00: allows a faster reaction in case of faulty nodes
- reschedule_unknown 00:00:30: enables rescheduling of jobs running on failing nodes
- qmaster_params ENABLE_FORCED_QDEL_IF_UNKNOWN: forces job deletion on unresponsive nodes
- qmaster_params ENABLE_RESCHEDULE_KILL: forces rescheduling or killing of jobs running on failing nodes
Slurm: decrease SlurmdTimeout to 120 seconds to speed up replacement of faulty nodes
Always use full master FQDN when mounting NFS on compute nodes. This solves some issues occurring with some networking
setups and custom DNS configurations
Set soft and hard ulimit on open files to 10000 for all supported OSs
Pin python supervisor version to 3.4.0
Remove unused compute_instance_type from jobwatcher.cfg
Removed unused max_queue_size from sqswatcher.cfg
Remove double quoting of the post_install args

Bug Fixes

Fix issue that was preventing Torque from being used on Centos 7
Start node daemons at the end of instance initialization. The time spent for post-install script and node
initialization is not counted as part of node idletime anymore.
Fix issue which was causing an additional and invalid EBS mount point to be added in case of multiple EBS
Install Slurm libpmpi/libpmpi2 that is distributed in a separate package since Slurm 17

Support

Assets 2

03 Apr 08:56

enrico-usai

v2.3.1

7568a23

AWS ParallelCluster 2.3.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.3.1.

This is associated with AWS ParallelCluster v2.3.1.

Enhancements

FSx Lustre - add support in Amazon Linux

Changes

Slurm - upgrade to version 18.08.6.2
Slurm - declare nodes in separate config file and use FUTURE for dummy nodes
Slurm - set ReturnToService=1 in scheduler config in order to recover instances that were initially marked as down due to a transient issue.
NVIDIA - update drivers to version 418.56
CUDA - update toolkit to version 10.0
Increase default EBS volume size from 15GB to 17GB
Add LocalHostname to COMPUTE_READY events
Pin future, retrying and six packages in Ubuntu 14.04
Add stackname and max_queue_size to sqswatcher configuration

Support

Assets 2

28 Feb 13:48

demartinofra

v2.2.1

554f5b6

AWS ParallelCluster 2.2.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.2.1.

This is associated with AWS ParallelCluster v2.2.1.

Features

Support for FSx Lustre with Centos 7
Check AWS EC2 account limits before starting cluster creation
Allow users to force job deletion with SGE scheduler

Changes

Set default value to compute for placement_group option
pcluster ssh: use private IP when the public one is not available
pcluster ssh: now works also when stack is not completed as long as the master IP is available

Bugfixes

awsbsub: fix file upload with absolute path
pcluster ssh: fix issue that was preventing the command from working correctly when stack status is UPDATE_ROLLBACK_COMPLETE
Fix block device conversion to correctly attach EBS nvme volumes
Wait for Torque scheduler initialization before completing master node setup
pcluster version: now works also when no ParallelCluster config is present
Improve nodewatcher daemon logic to detect if a SGE compute node has running jobs

Support

Assets 2

08 Jan 14:37

lukeseawalker

v2.1.1

e88b787

AWS ParallelCluster 2.1.1

We're excited to announce the release of AWS ParallelCluster Cookbook 2.1.1.

This is associated with AWS ParallelCluster v2.1.1.

Features

Support for AWS Beijing Region (cn-north-1) and Ningxia Region (cn-northwest-1

Bugfixes

No longer schedule jobs on compute nodes that are terminating

Support

Assets 2

18 Dec 01:15

sean-smith

v2.1.0

e687a18

AWS ParallelCluster v2.1.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.1.0.

This is associated with AWS ParallelCluster v2.1.0.

Features

Support for Elastic File System (EFS)
AWS Batch Multinode Parallel support
Support for RAID 0 and 1 EBS Volumes
Support for AWS Stockholm Region (eu-north-1)

Bugfixes

No longer schedule jobs on compute nodes that are terminating

Support

Assets 2

20 Nov 00:33

sean-smith

v2.0.2

17da90d

AWS ParallelCluster v2.0.2

We're excited to announce the release of AWS ParallelCluster Cookbook 2.0.2.

This is associated with AWS ParallelCluster v2.0.2.

Features

Support for new GovCloud region us-gov-east-1

Bugfixes

Fix regression with shared_dir parameter in the cluster configuration section.
Fixed issue with jq that prevented customers from using extra_json
Fixed issue with awscli version on ubuntu1404

Support

Assets 2

20 Nov 00:31

sean-smith

v2.0.0

a637db1

AWS ParallelCluster v2.0.0

We're excited to announce the release of AWS ParallelCluster Cookbook 2.0.0!

This is associated with AWS ParallelCluster v2.0.0.

Features

AWS Batch integration
Multiple EBS Volumes
Support for custom AMI's

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster note: we've moved cookbook issues to the main package, please create new issues there
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192

Assets 2

26 Oct 00:09

sean-smith

v1.6.0

2ed3203

CfnCluster v1.6.0

This is a release of the cfncluster-cookbook v1.6.0, associated with CfnCluster v1.6.0.

Features:

Refactor scaling up to take into account the number of pending/requested jobs/slots and instance slots.
Refactor scaling down to scale down faster and take advantage of per-second billing.
Add scaledown_idletime parameter as part of scale-down refactoring
Lock hosts before termination to ensure removal of dead compute nodes from host list
Fix HTTP proxy support

Assets 2

30 Aug 15:06

sean-smith

1.5.4

72402cc

CfnCluster v1.5.4

This is a release of the cfncluster-cookbook v1.5.4, associated with CfnCluster v1.5.4.

Features:

Set SGE Accounting summary to be true, this reports a single accounting record
for a mpi job
Add option to disable ganglia extra_json = { "cfncluster" : { "ganglia_enabled" : "no" } }

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enhancements

Changes

Bug Fixes

Support

Uh oh!

Enhancements

Changes

Bug Fixes

Support

Uh oh!

Enhancements

Changes

Support

Uh oh!

Features

Changes

Bugfixes

Support

Uh oh!

Features

Bugfixes

Support

Uh oh!

Features

Bugfixes

Support

Uh oh!

Features

Bugfixes

Support

Uh oh!

Features

Support

Uh oh!

Uh oh!

Uh oh!

Releases: aws/aws-parallelcluster-cookbook

AWS ParallelCluster v2.4.1

Enhancements

Changes

Bug Fixes

Support

Uh oh!

AWS ParallelCluster v2.4.0

Enhancements

Changes

Bug Fixes

Support

Uh oh!

AWS ParallelCluster 2.3.1

Enhancements

Changes

Support

Uh oh!

AWS ParallelCluster 2.2.1

Features

Changes

Bugfixes

Support

Uh oh!

AWS ParallelCluster 2.1.1

Features

Bugfixes

Support

Uh oh!

AWS ParallelCluster v2.1.0

Features

Bugfixes

Support

Uh oh!

AWS ParallelCluster v2.0.2

Features

Bugfixes

Support

Uh oh!

AWS ParallelCluster v2.0.0

Features

Support

Uh oh!

CfnCluster v1.6.0

Uh oh!

CfnCluster v1.5.4

Uh oh!