-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Add support for ParallelCluster 3.10.0 #242
Comments
cartalla
added a commit
that referenced
this issue
Jul 6, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Resolves #242
cartalla
added a commit
that referenced
this issue
Jul 11, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Resolves #242
cartalla
added a commit
that referenced
this issue
Jul 12, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated. Resolves #242 Change the installer so that it checks to make sure that the cluster stack isn't already being changed or in a bad state. Resolves #221
cartalla
added a commit
that referenced
this issue
Jul 12, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated. Resolves #242 Change the installer so that it checks to make sure that the cluster stack isn't already being changed or in a bad state. Resolves #221 Add support for ParallelCluster 3.10.1. Resolves #243
cartalla
added a commit
that referenced
this issue
Jul 12, 2024
Add support for ParallelCluster 3.10.0. Add alinux2023 support. Add support for external slurmdbd instance. Update documentation. Change the UID of the slurm user to 401 to match what ParallelCluster uses. Otherwise munge flags security errors because the UID of the submitter doesn't match the head node. Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated. Resolves #242 Change the installer so that it checks to make sure that the cluster stack isn't already being changed or in a bad state. Resolves #221 Add support for ParallelCluster 3.10.1. Resolves #243
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
https://github.com/aws/aws-parallelcluster/releases/tag/v3.10.0
A key feature in this version is the support for an external slurmdbd daemon.
Previously, each cluster had its own slurmdbd daemon running on the controller which is not the correct architecture.
The shared database should have a single slurmdbd daemon that is used by all slurm cluster controllers that are sharing the accounting database.
The ParallelCluster adds a new config: ExternalSurmdbd.
A key change in this version is that the munge version was updated from 0.5.15 -> 0.5.16.
This is critical because all of the clusters sharing an existing slurmdbd and all of the clusters being accessed from
a login node such as a DCV desktop need to be running the same version of munge. This means that you can't
share a slurm database and clusters that use version earlier than 3.10.0 with 3.10.0 or later.
The text was updated successfully, but these errors were encountered: