Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

203 bug slurm zfsyml doesnt work #214

Merged
merged 1 commit into from
Mar 22, 2024
Merged

Conversation

cartalla
Copy link
Contributor

@cartalla cartalla commented Mar 7, 2024

Update config files and fix errors found in testing new configs

Add --RESEnvironmentName to the installer

Ease initial integration with Research and Engineering Studio (RES).

Automatically add the correct submitter security groups and configure
the /home directory.

Automatically choose the subnets if not specified based on RES subnets.

Resolves #207

============================

Update template config files

Added more comments to clarify that these are examples that should be copied
and customized by users.

Added comments for typical configuration options.

Deleted obsolete configs that were from v1.

Resolves #203

=============================

Set default head node instance type based on architecture.

Resolves #206

==============================

Clean up ansible-lint errors and warnings.
Arm architecture cluster was failing because of an incorrect condition in the ansible playbook that is flagged by lint.

==============================

Use vdi controller instead of cluster manager for users and groups info

Cluster manager stopped being domain joined for some reason.

==============================

Paginate describe_instances when creating head node a record.

Otherwise, may not find the cluster head node instance.

==============================

Add default MungeKeySecret.

This should be the default or you can't access multiple clusters from the same server.

==============================

Increase timeout for ssm command that configures submitters

Need the time to compile slurm.

==============================

Force slurm to be rebuilt for submitters of all os distributions even if they match the os of the cluster.

Otherwise get errors because can't find PluginDir in the same location as when it was compiled.

==============================

Paginate describe_instances in UpdateHeadNode lambda

==============================

Add check for min memory of 4 GB for slurm controller

==============================

Sync EC2InstanceTypeInfo.py with hpc-cost-simulator.

==============================

Update documentation.

Remove Regions from InstanceConfig. This was left over from legacy cluster and
ParallelCluster doesn't support multiple regions.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@cartalla cartalla linked an issue Mar 7, 2024 that may be closed by this pull request
@cartalla cartalla force-pushed the 203-bug-slurm_zfsyml-doesnt-work branch 4 times, most recently from 8822279 to 79fbd90 Compare March 10, 2024 00:07
@cartalla cartalla force-pushed the 203-bug-slurm_zfsyml-doesnt-work branch from 79fbd90 to b330437 Compare March 20, 2024 20:43
Add --RESEnvironmentName to the installer

Ease initial integration with Research and Engineering Studio (RES).

Automatically add the correct submitter security groups and configure
the /home directory.

Automatically choose the subnets if not specified based on RES subnets.

Resolves #207

============================

Update template config files

Added more comments to clarify that these are examples that should be copied
and customized by users.

Added comments for typical configuration options.

Deleted obsolete configs that were from v1.

Resolves #203

=============================

Set default head node instance type based on architecture.

Resolves #206

==============================

Clean up ansible-lint errors and warnings.
Arm architecture cluster was failing because of an incorrect condition in the ansible playbook that is flagged by lint.

==============================

Use vdi controller instead of cluster manager for users and groups info

Cluster manager stopped being domain joined for some reason.

==============================

Paginate describe_instances when creating head node a record.

Otherwise, may not find the cluster head node instance.

==============================

Add default MungeKeySecret.

This should be the default or you can't access multiple clusters from the same server.

==============================

Increase timeout for ssm command that configures submitters

Need the time to compile slurm.

==============================

Force slurm to be rebuilt for submitters of all os distributions even if they match the os of the cluster.

Otherwise get errors because can't find PluginDir in the same location as when it was compiled.

==============================

Paginate describe_instances in UpdateHeadNode lambda

==============================

Add check for min memory of 4 GB for slurm controller

==============================

Update documentation.

Remove Regions from InstanceConfig. This was left over from legacy cluster.
ParallelCluster doesn't support multiple regions.
@cartalla cartalla force-pushed the 203-bug-slurm_zfsyml-doesnt-work branch from b330437 to a39d12f Compare March 22, 2024 23:25
@cartalla cartalla marked this pull request as ready for review March 22, 2024 23:27
@cartalla cartalla merged commit 58f70e7 into main Mar 22, 2024
@cartalla cartalla deleted the 203-bug-slurm_zfsyml-doesnt-work branch March 22, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant