-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running install.sh with -cdk-cmd update in rapid succession can damage the cluster #221
Comments
Add a check to make sure that the cluster stack isn't already being updated or is in a bad state and abort the install. |
cartalla
added a commit
that referenced
this issue
Jul 12, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated. Resolves #242 Change the installer so that it checks to make sure that the cluster stack isn't already being changed or in a bad state. Resolves #221
cartalla
added a commit
that referenced
this issue
Jul 12, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated. Resolves #242 Change the installer so that it checks to make sure that the cluster stack isn't already being changed or in a bad state. Resolves #221 Add support for ParallelCluster 3.10.1. Resolves #243
cartalla
added a commit
that referenced
this issue
Jul 12, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated. Resolves #242 Change the installer so that it checks to make sure that the cluster stack isn't already being changed or in a bad state. Resolves #221 Add support for ParallelCluster 3.10.1. Resolves #243
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I ran a --cdk-cmd update to update Instance selections. Then I realized I wanted an additional change, so I modified my config file, and ran the update again. Unfortunately, this corrupted my cluster as the two commands were run too close in succession. The second command tried to do a rollback and that failed.
Can we put in some sort of check to ensure the CloudFormation is not "IN PROGRESS" before allowing install.sh to update?
To reproduce just change some instances in your config and then do it again in rapid order.
The text was updated successfully, but these errors were encountered: