Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI Node daemonset #14

Merged
merged 3 commits into from
Dec 20, 2021
Merged

CSI Node daemonset #14

merged 3 commits into from
Dec 20, 2021

Conversation

jmolmo
Copy link
Contributor

@jmolmo jmolmo commented Dec 9, 2021

I need to go through unit tests but I think it is good to have a first version of the node daemonset to discuss changes and improvements.

I have lot of doubts about to use a configmap for storing the lvmd configuration. It can work for SNO deployments, but we are going to have problems with more that 1 node clusters, because the configmap cannot be different for each node using a Daemonset.
The only way I see to do that is to use a init container to read a configmap (one per node) and generate a file to be used when lvmd starts. Thoughts?

Update management:
I only can see two changes to manage:

  1. Change of nodes : Now, we do not have the node selector in the CRD. We will need to address this part when we will introduce tyhe node selector
  2. Change of lvmd configuration: a new Sc or new VGs in the node. Probably to address when we have a first version wroking for SNO with the happy path.

Signed-off-by: Juan Miguel Olmo Martínez [email protected]

Copy link
Contributor

@nbalacha nbalacha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also requires the SCC, unit tests, and the rbacs for the node daemonset reconciliation.

controllers/lvmcluster_controller.go Outdated Show resolved Hide resolved
controllers/topolvm_node.go Outdated Show resolved Hide resolved
controllers/topolvm_node.go Outdated Show resolved Hide resolved
controllers/topolvm_node.go Outdated Show resolved Hide resolved
controllers/defaults.go Outdated Show resolved Hide resolved
controllers/topolvm_node.go Outdated Show resolved Hide resolved
@jmolmo
Copy link
Contributor Author

jmolmo commented Dec 13, 2021

This also requires the SCC...

Which SCC's?

@jmolmo jmolmo requested a review from nbalacha December 13, 2021 12:03
@nbalacha
Copy link
Contributor

This also requires the SCC...

Which SCC's?

The node pods are privileged and hence require SCCs in order to access the host resources. We will need to create them.

hostPathDirectory := corev1.HostPathDirectory
hostPathDirectoryOrCreateType := corev1.HostPathDirectoryOrCreate
storageMedium := corev1.StorageMediumMemory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a lot of static code when generating the daemonset spec. I've a suggestion to use to yaml files and read them directly using go embed? This will reduce the lines of codes and also make it easier to understand the spec. An example of this approach can be seen here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this in the code for now. We can revisit this later.

Comment on lines 20 to 23
subjects:
- kind: ServiceAccount
name: topolvm-node
roleRef:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-tab this, currently I think they are indented to be part of metadata

@jmolmo jmolmo force-pushed the topolvm_node branch 4 times, most recently from 23e8c5b to 85d9d73 Compare December 16, 2021 19:37
@jmolmo jmolmo changed the title WIP: CSI Node daemonset CSI Node daemonset Dec 16, 2021
@jmolmo
Copy link
Contributor Author

jmolmo commented Dec 16, 2021

Unit test passing, and I have tested also in an openshift cluster, now the daemonset is created and deleted when a lvmcluster CRD is created/deleted, but I have an error in the privileges assignation.... I think I miss something in the SCC ...

Warning FailedCreate 82s daemonset-controller Error creating: pods "topolvm-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted: .spec.securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.containers[0].securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000640000, 1000649999], spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.containers[1].securityContext.runAsUser: Invalid value: 0: must be in the ranges: [1000640000, 1000649999], spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[1].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.containers[2].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, spec.containers[3].securityContext.hostPID: Invalid value: true: Host PID is not allowed to be used, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "node-exporter": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount]

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 17, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 17, 2021
@jmolmo
Copy link
Contributor Author

jmolmo commented Dec 17, 2021

Almost there ....

Now I am using the openshift "privileged" scc for "topolvm-nanager", because it seems that with the custom scc I created ("topolvm_node_scc.yaml") we do not have all the rights needed. I have "extracted" the scc from the "psps" for node plugin in the topolvm project, and after several tests i have added more rights, but it continues "blocking" the deployment of the node daemonset because rights.

with an image generated with the code in this branch, (using the "privileged" scc), I deployed in an openshift cluster, with excellent results., The node daemonsed is created, and the pods are deployed in all the nodes, waiting for the lvmd config file. Creating manually a VG in one of the nodes, and creating the config file "lvmd.yaml" , all the containers in the pod start perfectly.
The nodes where the config file is not present, remains waiting for the config file, until it is created.

$ oc get po -n lvm-operator-system
NAME                                 READY   STATUS             RESTARTS      AGE
controller-manager-c9c6d7f57-kxj7x   2/2     Running            0             3m13s
topolvm-node-5bcvf                   0/4     Init:0/1           0             2m56s
topolvm-node-8wzcf                   0/4     Init:0/1           0             2m56s
topolvm-node-jrxb7                   3/4     CrashLoopBackOff   4 (49s ago)   2m56s
topolvm-node-nwq5m                   0/4     Init:0/1           0             2m56s
topolvm-node-pg296                   0/4     Init:0/1           0             2m56s
topolvm-node-rzvd8                   0/4     Init:0/1           0             2m56s


$ oc get po -n lvm-operator-system
NAME                                  READY   STATUS     RESTARTS   AGE
controller-manager-c9c6d7f57-t5pnp    2/2     Running    0          2m45s
topolvm-controller-58c8f86445-wt2tq   4/4     Running    0          111s
topolvm-node-bzmc2                    4/4     Running    0          111s
topolvm-node-cwvzq                    0/4     Init:0/1   0          111s
topolvm-node-dhk9p                    0/4     Init:0/1   0          111s
topolvm-node-hp79l                    0/4     Init:0/1   0          111s
topolvm-node-qc5bp                    0/4     Init:0/1   0          111s
topolvm-node-qxvsz                    0/4     Init:0/1   0          111s

$ oc describe po topolvm-node-bzmc2 -n lvm-operator-system <-------------- node with VG and lvmd.yaml file
Normal  Scheduled       61s   default-scheduler  Successfully assigned lvm-operator-system/topolvm-node-bzmc2 to ci-ln-3h9s5k2-72292-xd6rf-worker-b-m59z9
Normal  AddedInterface  60s   multus             Add eth0 [10.129.2.34/23] from openshift-sdn
Normal  Pulling         60s   kubelet            Pulling image "registry.access.redhat.com/ubi8/ubi-minimal"
Normal  Pulled          60s   kubelet            Successfully pulled image "registry.access.redhat.com/ubi8/ubi-minimal" in 483.024573ms
Normal  Created         59s   kubelet            Created container file-checker
Normal  Started         59s   kubelet            Started container file-checker
Normal  Pulled          58s   kubelet            Container image "quay.io/topolvm/topolvm:0.10.3" already present on machine
Normal  Created         58s   kubelet            Created container lvmd
Normal  Started         58s   kubelet            Started container lvmd
Normal  Pulled          58s   kubelet            Container image "quay.io/topolvm/topolvm:0.10.3" already present on machine
Normal  Created         58s   kubelet            Created container topolvm-node
Normal  Started         58s   kubelet            Started container topolvm-node
Normal  Pulled          58s   kubelet            Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0" already present on machine
Normal  Created         58s   kubelet            Created container csi-registrar
Normal  Started         58s   kubelet            Started container csi-registrar
Normal  Pulled          58s   kubelet            Container image "k8s.gcr.io/sig-storage/livenessprobe:v2.5.0" already present on machine
Normal  Created         57s   kubelet            Created container liveness-probe
Normal  Started         57s   kubelet            Started container liveness-probe


$ oc describe po topolvm-node-cwvzq -n lvm-operator-system  <-------------- node without VG and without lvmd.yaml file
Normal  Scheduled       2m35s  default-scheduler  Successfully assigned lvm-operator-system/topolvm-node-cwvzq to ci-ln-3h9s5k2-72292-xd6rf-worker-a-zpnfl
Normal  AddedInterface  2m33s  multus             Add eth0 [10.131.0.89/23] from openshift-sdn
Normal  Pulling         2m33s  kubelet            Pulling image "registry.access.redhat.com/ubi8/ubi-minimal"
Normal  Pulled          2m32s  kubelet            Successfully pulled image "registry.access.redhat.com/ubi8/ubi-minimal" in 1.190537845s
Normal  Created         2m32s  kubelet            Created container file-checker
Normal  Started         2m32s  kubelet            Started container file-checker

$ oc logs topolvm-node-cwvzq file-checker -n lvm-operator-system
waiting for lvmd config file
waiting for lvmd config file

$ oc logs topolvm-node-cwvzq lvmd -n lvm-operator-system
Error from server (BadRequest): container "lvmd" in pod "topolvm-node-cwvzq" is waiting to start: PodInitializing

Copy link
Contributor

@nbalacha nbalacha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rework the commits as well - there is one for a wip and rebase.

controllers/topolvm_node.go Outdated Show resolved Hide resolved

func (n topolvmNode) ensureCreated(r *LVMClusterReconciler, ctx context.Context, lvmCluster *lvmv1alpha1.LVMCluster) error {
nodeDaemonSet := getNodeDaemonSet(lvmCluster)
result, err := cutil.CreateOrUpdate(ctx, r.Client, nodeDaemonSet, func() error { return nil })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a mutatefn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. If you don't mind I will address this when we have merged the first version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

controllers/topolvm_node.go Outdated Show resolved Hide resolved
controllers/topolvm_node.go Outdated Show resolved Hide resolved
controllers/topolvm_node.go Show resolved Hide resolved
controllers/topolvm_node.go Show resolved Hide resolved
controllers/topolvm_node.go Outdated Show resolved Hide resolved
config/rbac/topolvm_node_role.yaml Show resolved Hide resolved
config/rbac/sccs.yaml Outdated Show resolved Hide resolved
config/rbac/topolvm_node_service_account.yaml Outdated Show resolved Hide resolved
controllers/topolvm_node.go Show resolved Hide resolved
controllers/topolvm_node.go Show resolved Hide resolved
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 20, 2021

@leelavg: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@sp98 sp98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

controllers/topolvm_node.go Show resolved Hide resolved
Topolvm csi node is a daemonset deploying lvmd, topolvm node csi, csiregistrar
and a liveness probe container.
Uses an init container for wait deployment until lvmd config file is available

Signed-off-by: Juan Miguel Olmo Martínez <[email protected]>
Signed-off-by: Juan Miguel Olmo Martínez <[email protected]>
Verify node daemonset creation and deletion using the controller unit test

Signed-off-by: Juan Miguel Olmo Martínez <[email protected]>
@jmolmo jmolmo requested a review from nbalacha December 20, 2021 11:59
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 20, 2021
Copy link
Contributor

@nbalacha nbalacha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the understanding as discussed that the mutatefn and scc will be implemented shortly.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 20, 2021

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmolmo, nbalacha, sp98

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nbalacha nbalacha merged commit 24656b7 into openshift:main Dec 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants