Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing Dependency on IMDS, allowing hostNetwork: true to be removed #681

Merged

Conversation

jonathanrainer
Copy link
Contributor

@jonathanrainer jonathanrainer commented Apr 24, 2022

Is this a bug fix or adding new feature?
This PR is a fix that allows the driver to keep functioning even in situations where IMDS is not available. It does this by falling back to the Kubernetes API to get the required information if it cannot get it from the EC2Metadata Service.

What is this PR about? / Why do we need it?
This PR is in response to #313, and will allow the driver to function in environments that do not have access to IMDS. The code was based heavily on kubernetes-sigs/aws-ebs-csi-driver#907, but refactored slightly to use a provider pattern for the metadata and factor out each part into their own file to allow more ease of testing.

What testing is done?
Have written additional unit tests to cover the change but need some help running the E2E tests as running them on my local setup has been fraught with problems. Any guidance on this (can we run them on the usual E2E infrastructure as a one off perhaps?) would be greatly appreciated.

fixes #313

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 24, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @jonathanrainer. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 24, 2022
@k8s-ci-robot k8s-ci-robot requested review from jqmichael and kbasv April 24, 2022 09:56
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 24, 2022
@jonathanrainer jonathanrainer changed the title Removing Dependency on IMDS, allowing hostNetwork: true to be remove Removing Dependency on IMDS, allowing hostNetwork: true to be removed Apr 24, 2022
@wongma7
Copy link
Contributor

wongma7 commented Apr 27, 2022

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 27, 2022
@jonathanrainer jonathanrainer force-pushed the fix-imds-dependency branch from 96c0b7a to d5db2a9 Compare May 6, 2022 19:52
@Ashley-wenyizha
Copy link
Contributor

e2e auto-run tests looking good

@Ashley-wenyizha
Copy link
Contributor

/lgtm

@k8s-ci-robot
Copy link
Contributor

@Ashley-wenyizha: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

pkg/cloud/cloud.go Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
pkg/cloud/cloud.go Outdated Show resolved Hide resolved
@jonathanrainer
Copy link
Contributor Author

/retest

@jonathanrainer
Copy link
Contributor Author

Have tested this in my own EKS cluster. Did a standard install of the Helm Chart plus my changes and the controller booted fine and could serve both static and dynamically provisioned PVs. The IPs also were pod IPs rather than Node IPs as can be evidenced in the below by comparing the pods IPs to the Node Names.
Screenshot 2022-06-12 at 11 19 13

@jonathanrainer
Copy link
Contributor Author

/retest

1 similar comment
@jonathanrainer
Copy link
Contributor Author

/retest

Makefile Show resolved Hide resolved
node:
logLevel: 5
serviceAccount:
controller:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with latest helm chart, should be controller.serviceAccount not serviceAccount.controller similar to the EBS one.
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/hack/values_eksctl.yaml

Also the enalbeVolumeSnapshot line can be removed

@wongma7
Copy link
Contributor

wongma7 commented Jun 14, 2022

Once kubernetes/test-infra#26573 merges we can see if the CI passes /test pull-aws-efs-csi-driver-external-test-eks as well. I really want CI to create a cluster with eksctl --disable-pod-imds before we release this. (Because in EBS we had a big incident where we claimed to have fixed this IMDS dependency but actually nobody tested it on EKS and the CI only exercised kops....)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 15, 2022
@jonathanrainer
Copy link
Contributor Author

@wongma7 Had a look through this and I think the majority of the errors are because of service accounts that don't have permissions, I'll try and have a look into it but if you could too that would be great

pkg/cloud/k8s_metadata.go Outdated Show resolved Hide resolved
@wongma7
Copy link
Contributor

wongma7 commented Jun 15, 2022

@jonathanrainer need to add https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/templates/controller.yaml#L93-L96 to the template (and kustomize , by running make generate-kustomize)

            - name: CSI_NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName

###
## Printing pod efs-csi-controller-695dc45655-tr8rc efs-plugin container logs
#
I0615 08:52:33.911293       1 config_dir.go:62] Mounted directories do not exist, creating directory at '/etc/amazon/efs'
I0615 08:52:33.912260       1 metadata.go:63] getting MetadataService...
I0615 08:52:37.089439       1 metadata.go:71] retrieving metadata from Kubernetes API
F0615 08:52:37.089492       1 driver.go:56] could not get metadata from AWS: CST_NODE_NAME env var not set 

this is good in sense that the CI test is working as intended...metadata is not available so it's looking for nodeName

@jonathanrainer
Copy link
Contributor Author

Thanks that's really useful to know I can pull the pod logs out of prow now, pretty sure I'll be much faster/self-sufficient in future!

@jonathanrainer
Copy link
Contributor Author

@wongma7 Yay it all passed, thanks a lot for your help

@jonathanrainer jonathanrainer requested a review from wongma7 June 15, 2022 19:09
@wongma7
Copy link
Contributor

wongma7 commented Jun 15, 2022

THX lgtm, could you squash the commits and i'll drop the "real" lgtm. Ideally one commit with vendor changes and one commit with the rest, but putting it all together in 1 commit is fine as well.

… back to k8s API

To do this we put the IMDS requirements behind a MetadataProvider object that can also call to k8s if required. This has also beefed up the E2E tests to cover EKS properly and added unit tests around the new changes.
@jonathanrainer
Copy link
Contributor Author

@wongma7 All squashed and ready to go

@wongma7
Copy link
Contributor

wongma7 commented Jun 15, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 15, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jonathanrainer, wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

efs-csi-controller won't start if IMDS access is blocked
4 participants