Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current EKS instructions break kubelet memory metrics (otel, etc.) #618

Open
dfsdevops opened this issue Oct 24, 2024 · 3 comments
Open
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@dfsdevops
Copy link

Spegel version

v0.0.25

Kubernetes distribution

EKS

Kubernetes version

v1.29.8-eks-a737599

CNI

Amazon VPC CNI

Describe the bug

I have been using Karpenter (as well as terraform,) to update the user data field when launching new nodes with the following spec:

---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiSelectorTerms:
  - alias: al2@v20240910
  userData: |
    set -ex
    mkdir -p /etc/containerd/config.d
    cat > /etc/containerd/config.d/spegel.toml << EOL
    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = "/etc/containerd/certs.d"
    [plugins."io.containerd.grpc.v1.cri".containerd]
      discard_unpacked_layers = false
    EOL
    /etc/eks/bootstrap.sh

I'm basing this off these instructions https://github.com/spegel-org/spegel/blob/main/docs/COMPATIBILITY.md#eks

there seems to be an issue in that all of our Otel metrics for memory usage from the kubeletstats receiver are coming back as 0 rather than the number of bytes. I suspect that the patch is clobbering a containerd config dictionary that has an effect on kubelet metrics, as per this comment, a potential reason: awslabs/amazon-eks-ami#1628 (comment)

might be related to the default_runtime_name key thats being used the table here:

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true

My suggestion is that the documentation switch to using sed to override the template, as this seems to avoid the issue. I confirmed that after switching to this in my userData: field in Karpenter, I begin to see memory metrics in Otel again.

sed -i 's/discard_unpacked_layers = true/discard_unpacked_layers = false/g'  /etc/eks/containerd/containerd-config.toml
@dfsdevops dfsdevops added the bug Something isn't working label Oct 24, 2024
@phillebaba
Copy link
Member

This sounds like a problem in how Containerd merges configuration. Alternatively I have missed documentation about how overrides work. Using sed is not a recommended method as the EKS AMI can change their Containerd configuration at any time. We worked with them to get the imports working for this reason.

First step is to confirm that Containerd is omitting the original configuration when imports are used. After that we can move forward with figuring out how to solve this upstream if that is the case.

@phillebaba phillebaba added the help wanted Extra attention is needed label Oct 29, 2024
@dfsdevops
Copy link
Author

I certainly empathize with the fact that this was a carefully considered solution and trying to avoid issues with upstream changes. I agree it would be ideal to ensure containerd properly merges the TOML rather than hack around it with sed. It looks like they have an open issue for this containerd/containerd#5837

Maybe I can use the crictl command to do a before/after to see if the actual running config difference is identifiable. I didn't know about that originally.

@phillebaba
Copy link
Member

Thanks for linking the issue. I understand the problem a lot better now. So the problem is that the merging is not done at a deeper level than per section. Which explains why some of the original configuration is disappearing.

Could one option be to include the original configuration in the import file? This way we include all configuration again making sure that nothing disappears. Hopefully this is something that we can fix in Containerd v2.

We could update the docs if this temporary solution works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants