Clarification on users and nfs mounts #144
-
I'm trying to set up a cluster with magic castle. I was able to deploy the cluster on AWS using these configurations: Terraform configurationterraform {
required_version = ">= 0.14.5"
}
module "aws" {
source = "git::https://github.com/ComputeCanada/magic_castle.git//aws"
config_git_url = "https://github.com/ComputeCanada/puppet-magic_castle.git"
config_version = "10.0"
cluster_name = "cloudhpc"
domain = "projectpythia.org"
image = "ami-0155c31ea13d4abd2" # CentOS 8 - https://wiki.centos.org/Cloud/AWS
instances = {
mgmt = { type = "t3.micro", count = 1 },
login = { type = "t3.micro", count = 1 },
node = [{ type = "t3.micro", count = 2 }]
}
storage = {
type = "nfs"
home_size = 100
project_size = 50
scratch_size = 50
}
public_keys = [file("~/.ssh/id_rsa.pub")]
nb_users = 2
# Shared password, randomly chosen if blank
guest_passwd = ""
# AWS specifics
region = "us-west-2"
}
output "sudoer_username" {
value = module.aws.sudoer_username
}
output "guest_usernames" {
value = module.aws.guest_usernames
}
output "guest_passwd" {
value = module.aws.guest_passwd
}
output "public_ip" {
value = module.aws.ip
} However, a few things seem off.
List of users[centos@node2 ~]$ getent passwd | cut -d: -f1
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
operator
games
ftp
nobody
dbus
systemd-coredump
systemd-resolve
tss
polkitd
unbound
rpc
sssd
setroubleshoot
rpcuser
cockpit-ws
cockpit-wsinstance
chrony
sshd
rngd
centos
However, I am not seeing the nfs mounts on the management node. List of mounted filesystems on the management node[centos@mgmt1 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 439M 0 439M 0% /dev
tmpfs 471M 0 471M 0% /dev/shm
tmpfs 471M 6.3M 465M 2% /run
tmpfs 471M 0 471M 0% /sys/fs/cgroup
/dev/nvme0n1p1 10G 2.7G 7.4G 27% /
tmpfs 95M 0 95M 0% /run/user/1000 Am I missing something? I am new to magic castle, so please let me know if there's more documentation than what is in https://github.com/ComputeCanada/magic_castle/tree/master/docs. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
It takes about 20 minutes for the full configuration to complete, and the user account creation is the last step. Once things are done you should see directories in the login and compute nodes under
Again, this takes time to configure so make sure to wait a bit before checking. The NFS directories are exported from the management node, you can find them under
I'm also not that long using Magic Castle, but the docs (and all the help from @cmd-ntrf ) have made it pretty painless. Most things regarding the provisioning depend on the management node, to get there log in to the
or, specifically for the puppet stuff,
(@cmd-ntrf |
Beta Was this translation helpful? Give feedback.
-
Hi! To complete @ocaisa thorough answer, Insufficient RAM for the management node typically leads to failure of the cluster configuration, regardless of how long you wait. I recommend you try again with the flavors that can be found in the AWS example. |
Beta Was this translation helpful? Give feedback.
-
@cmd-ntrf, @ocaisa, thanks to your recommendations, and a bit of patience😀, I was able to get a cluster up and running (cc @kmpaul)! Also, I wanted to say thank you🙏🏽 for this amazing project! As a someone who has been requesting/fighting for a development cluster from our sys admins (@NCAR) for a really long time, I'm thrilled to see that magic-castle comes with all needed bells and whistles to deploy an HPC cluster. After tinkering with Magic-castle, I'm certain that it provides a world of endless possibilities, endless experimentation🚀 , and I look forward to abusing my sys administrator privileges 😅 |
Beta Was this translation helpful? Give feedback.
Hi!
To complete @ocaisa thorough answer,
t3-micro
flavor specs are insufficient for mgmt1 requirements. It requires at least 6GiB of RAM.Insufficient RAM for the management node typically leads to failure of the cluster configuration, regardless of how long you wait.
I recommend you try again with the flavors that can be found in the AWS example.