Skip to content

Conversation

lukeseawalker
Copy link
Contributor

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

lukeseawalker and others added 30 commits April 10, 2019 14:54
The custom node package could be passed as URL pointing to the
node archive

Signed-off-by: Luca Carrogu <[email protected]>
In this way the cfn_postinstall_args is aligned with the
cfn_preinstall_args variable.

Signed-off-by: Enrico Usai <[email protected]>
This mount point is wrong when the customer is using multiple ebs
volumes because the cfn_shared_dir contains the comma separated list
of the mount points.

Furthermore the same action is performed in the same script,
few lines below, by splitting by comma.

Signed-off-by: Enrico Usai <[email protected]>
This is the latest version with Python 2.6 support

Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Luca Carrogu <[email protected]>
The jobwatcher now retrieves this value dynamically from the stack parameters

Signed-off-by: Francesco De Martino <[email protected]>
The libpmi is now in a separate slurm package see https://bugs.schedmd.com/show_bug.cgi?id=4511
so it needs to be installed explicitly

This will solve aws/aws-parallelcluster#1008

Signed-off-by: Luca Carrogu <[email protected]>
The patch will let the script continue also when the following error is
returned by the parted command:
"Error: Partition(s) Y on /dev/XXX have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes."

Signed-off-by: Luca Carrogu <[email protected]>
This patch avoids network service restart failures when a configuration
file of an old network interface (not present anymore in the current
instance launch) was found

Signed-off-by: Luca Carrogu <[email protected]>
This fixes the issue with torque on centos 7

Signed-off-by: Francesco De Martino <[email protected]>
Skip test if jq is not installed, because for custom ami it is installed
during bootstrap time (inside cloudformation userdata)

Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Francesco De Martino <[email protected]>
Issue is reported in chef/bento#609
Using custom chef URL instead of default one (https://www.chef.io/chef/install.sh)
we are able to skip the error "dpkg: error: dpkg status database is locked by another process"

Signed-off-by: Luca Carrogu <[email protected]>
SGE installation folder is mounted from master node
The installation is done when "cfn_node_type" is "MasterServer" (at runtime)
or is "nil" (at packer time)

Signed-off-by: Luca Carrogu <[email protected]>
Slurm installation folder is mounted from master node
The installation is done when "cfn_node_type" is "MasterServer" (at runtime)
or is "nil" (at packer time)

Signed-off-by: Luca Carrogu <[email protected]>
SlurmdTimeout: the interval, in seconds, that the Slurm
controller waits for slurmd to respond before configuring
that node's state to DOWN.

Reducing it in order to have a faster reaction to nodes that
are failing.

Signed-off-by: Francesco De Martino <[email protected]>
move supervisord start at the end of the user data in a finalize
chef recipe. This solves the problem of the nodewatcher that was
started before the end of chef recipes and post_install script and
therefore the idletime was being mistakenly computed

Signed-off-by: Francesco De Martino <[email protected]>
PATH is normally set in cfn userdata. In order to have chef recipes
independent from userdata I'm setting explicitly the PATH for this
command.

Signed-off-by: Francesco De Martino <[email protected]>
This will add support for Ubuntu in China NorthWest region (cn-northwest-1)

Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Luca Carrogu <[email protected]>
Sean Smith and others added 20 commits May 28, 2019 23:23
Signed-off-by: Sean Smith <[email protected]>
Once the EFA package is installed, it is not possible to install the
openmpi-devel package. Make installation conditional depending on the OS
 and region

Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Francesco De Martino <[email protected]>
* fetch installer during ami build (or at runtime with custom ami)
* install only on Compute Nodes

Signed-off-by: Sean Smith <[email protected]>
Signed-off-by: Sean Smith <[email protected]>
This builds the rpms into the ami, then set the limits at runtime.

Signed-off-by: Sean Smith <[email protected]>
Update base_config recipe to perform an unconditional attempt of fsx
filesysterm mount, rather than restricting to alinux/centos. Supports
cases with custom ubuntu amis with fsx extensions installed. This is
a no-op change in the default parallelcluster configuration, as the
client also verifies os compatibility during configuration validation.

Tidy tcommon call of efs mount from master/compute recipes into
base_config along fsx mount.
Signed-off-by: Sean Smith <[email protected]>
Signed-off-by: Sean Smith <[email protected]>
Signed-off-by: Luca Carrogu <[email protected]>
Signed-off-by: Francesco De Martino <[email protected]>
* Sets the max_memory ulimit on the master when EFA is enabled

Signed-off-by: Sean Smith <[email protected]>
@lukeseawalker lukeseawalker merged commit e94e9c2 into master Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants