Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add support for ParallelCluster 3.10.0 #242

Closed
cartalla opened this issue Jul 5, 2024 · 0 comments · Fixed by #244
Closed

[FEATURE] Add support for ParallelCluster 3.10.0 #242

cartalla opened this issue Jul 5, 2024 · 0 comments · Fixed by #244
Assignees

Comments

@cartalla
Copy link
Contributor

cartalla commented Jul 5, 2024

Is your feature request related to a problem? Please describe.

https://github.com/aws/aws-parallelcluster/releases/tag/v3.10.0

A key feature in this version is the support for an external slurmdbd daemon.
Previously, each cluster had its own slurmdbd daemon running on the controller which is not the correct architecture.
The shared database should have a single slurmdbd daemon that is used by all slurm cluster controllers that are sharing the accounting database.

The ParallelCluster adds a new config: ExternalSurmdbd.

A key change in this version is that the munge version was updated from 0.5.15 -> 0.5.16.
This is critical because all of the clusters sharing an existing slurmdbd and all of the clusters being accessed from
a login node such as a DCV desktop need to be running the same version of munge. This means that you can't
share a slurm database and clusters that use version earlier than 3.10.0 with 3.10.0 or later.

@cartalla cartalla self-assigned this Jul 5, 2024
cartalla added a commit that referenced this issue Jul 6, 2024
Add support for ParallelCluster 3.10.0.

Add alinux2023 support.

Add support for external slurmdbd instance.

Resolves #242
cartalla added a commit that referenced this issue Jul 11, 2024
Add support for ParallelCluster 3.10.0.

Add alinux2023 support.

Add support for external slurmdbd instance.

Update documentation.

Change the UID of the slurm user to 401 to match what ParallelCluster uses.
Otherwise munge flags security errors because the UID of the submitter doesn't match the head node.

Resolves #242
cartalla added a commit that referenced this issue Jul 12, 2024
Add support for ParallelCluster 3.10.0.

Add alinux2023 support.

Add support for external slurmdbd instance.

Update documentation.

Change the UID of the slurm user to 401 to match what ParallelCluster uses.
Otherwise munge flags security errors because the UID of the submitter doesn't match the head node.

Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated.

Resolves #242

Change the installer so that it checks to make sure that the cluster stack
isn't already being changed or in a bad state.

Resolves #221
cartalla added a commit that referenced this issue Jul 12, 2024
Add support for ParallelCluster 3.10.0.

Add alinux2023 support.

Add support for external slurmdbd instance.

Update documentation.

Change the UID of the slurm user to 401 to match what ParallelCluster uses.
Otherwise munge flags security errors because the UID of the submitter doesn't match the head node.

Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated.

Resolves #242

Change the installer so that it checks to make sure that the cluster stack
isn't already being changed or in a bad state.

Resolves #221

Add support for ParallelCluster 3.10.1.

Resolves #243
cartalla added a commit that referenced this issue Jul 12, 2024
Add support for ParallelCluster 3.10.0.

Add alinux2023 support.

Add support for external slurmdbd instance.

Update documentation.

Change the UID of the slurm user to 401 to match what ParallelCluster uses.
Otherwise munge flags security errors because the UID of the submitter doesn't match the head node.

Change the UpdateHeadNode lambda to only do the update via ssm if the cluster ins't already being updated.

Resolves #242

Change the installer so that it checks to make sure that the cluster stack
isn't already being changed or in a bad state.

Resolves #221

Add support for ParallelCluster 3.10.1.

Resolves #243
@cartalla cartalla linked a pull request Jul 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant