Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add more supported instances (P5) to EFA Device Plugin DaemonSet - refer to eks-charts #8089

Open
ytsssun opened this issue Dec 13, 2024 · 1 comment
Labels
kind/feature New feature or request

Comments

@ytsssun
Copy link

ytsssun commented Dec 13, 2024

What feature/behavior/change do you want?

eksctl would try to install the EFA device plugin if efaEnabled is set to true. The pkg/addons/assets/efa-device-plugin.yaml maintained in this repo is what eksctl used to deploy the EFA device plugin DaemonSet. However, it is pretty outdated:

  • It lacks of the newly supported instance type like p5 instances.
  • It is still pointing to a fairly old image tag /eks/aws-efa-k8s-device-plugin:v0.3.3. The official EFA Device plugin vended by eks-charts is pointing to image tag v0.5.4 already.

Instead of maintain the yaml file, can we just generate it from the latest eks-chart? One can do this

git clone https://github.com/aws/eks-charts.git
cd eks-charts/stable/aws-efa-k8s-device-plugin/
helm template . > efa-device-plugin.yaml

The generated efa-device-plugin.yaml would be like below

---
# Source: aws-efa-k8s-device-plugin/templates/daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: release-name-aws-efa-k8s-device-plugin
  labels:
    helm.sh/chart: aws-efa-k8s-device-plugin-v0.5.7
    app.kubernetes.io/name: aws-efa-k8s-device-plugin
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/version: "v0.5.4"
    app.kubernetes.io/managed-by: Helm
spec:
  selector:
    matchLabels:
      name:  release-name-aws-efa-k8s-device-plugin
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: release-name-aws-efa-k8s-device-plugin
    spec:
      tolerations:
        - key: CriticalAddonsOnly
          operator: Exists
      # Mark this pod as a critical add-on; when enabled, the critical add-on
      # scheduler reserves resources for critical add-on pods so that they can
      # be rescheduled after a failure.
      # See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
      priorityClassName: "system-node-critical"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: node.kubernetes.io/instance-type
                  operator: In
                  values:
                    - m5dn.24xlarge
                    - m5dn.metal
                    - m5n.24xlarge
                    - m5n.metal
                    - m5zn.12xlarge
                    - m5zn.metal
                    - m6a.48xlarge
                    - m6a.metal
                    - m6i.32xlarge
                    - m6i.metal
                    - m6id.32xlarge
                    - m6id.metal
                    - m6idn.32xlarge
                    - m6idn.metal
                    - m6in.32xlarge
                    - m6in.metal
                    - m7a.48xlarge
                    - m7a.metal-48xl
                    - m7g.16xlarge
                    - m7g.metal
                    - m7gd.16xlarge
                    - m7i.48xlarge
                    - m7i.metal-48xl
                    - c5n.9xlarge
                    - c5n.18xlarge
                    - c5n.metal
                    - c6a.48xlarge
                    - c6a.metal
                    - c6gn.16xlarge
                    - c6i.32xlarge
                    - c6i.metal
                    - c6id.32xlarge
                    - c6id.metal
                    - c6in.32xlarge
                    - c6in.metal
                    - c7a.48xlarge
                    - c7a.metal-48xl
                    - c7g.16xlarge
                    - c7g.metal
                    - c7gd.16xlarge
                    - c7gn.16xlarge
                    - c7i.48xlarge
                    - c7i.metal-48xl
                    - r5dn.24xlarge
                    - r5dn.metal
                    - r5n.24xlarge
                    - r5n.metal
                    - r6a.48xlarge
                    - r6a.metal
                    - r6i.32xlarge
                    - r6i.metal
                    - r6idn.32xlarge
                    - r6idn.metal
                    - r6in.32xlarge
                    - r6in.metal
                    - r6id.32xlarge
                    - r6id.metal
                    - r7a.48xlarge
                    - r7a.metal-48xl
                    - r7g.16xlarge
                    - r7g.metal
                    - r7gd.16xlarge
                    - r7i.48xlarge
                    - r7i.metal-48xl
                    - r7iz.32xlarge
                    - r7iz.metal-32xl
                    - x2idn.32xlarge
                    - x2idn.metal
                    - x2iedn.32xlarge
                    - x2iedn.metal
                    - x2iezn.12xlarge
                    - x2iezn.metal
                    - i3en.12xlarge
                    - i3en.24xlarge
                    - i3en.metal
                    - i4g.16xlarge
                    - i4i.32xlarge
                    - i4i.metal
                    - im4gn.16xlarge
                    - dl1.24xlarge
                    - dl2q.24xlarge
                    - g4dn.8xlarge
                    - g4dn.12xlarge
                    - g4dn.16xlarge
                    - g4dn.metal
                    - g5.8xlarge
                    - g5.12xlarge
                    - g5.16xlarge
                    - g5.24xlarge
                    - g5.48xlarge
                    - g6.8xlarge
                    - g6.12xlarge
                    - g6.16xlarge
                    - g6.24xlarge
                    - g6.48xlarge
                    - g6e.8xlarge
                    - g6e.12xlarge
                    - g6e.16xlarge
                    - g6e.24xlarge
                    - g6e.48xlarge
                    - gr6.8xlarge
                    - inf1.24xlarge
                    - p3dn.24xlarge
                    - p4d.24xlarge
                    - p4de.24xlarge
                    - p5.48xlarge
                    - p5e.48xlarge
                    - p5en.48xlarge
                    - trn1.32xlarge
                    - trn1n.32xlarge
                    - trn2.48xlarge
                    - vt1.24xlarge
                    - hpc6a.48xlarge
                    - hpc6id.32xlarge
                    - hpc7a.12xlarge
                    - hpc7a.24xlarge
                    - hpc7a.48xlarge
                    - hpc7a.96xlarge
                    - hpc7g.4xlarge
                    - hpc7g.8xlarge
                    - hpc7g.16xlarge
                - key: eks.amazonaws.com/compute-type
                  operator: NotIn
                  values:
                  - auto
      hostNetwork: true
      containers:
        - image: 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/aws-efa-k8s-device-plugin:v0.5.4
          name: aws-efa-k8s-device-plugin
          securityContext:
            allowPrivilegeEscalation: false
            capabilities:
              drop:
              - ALL
            runAsNonRoot: false
          resources:
            requests:
              cpu: 10m
              memory: 20Mi
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
            - name: infiniband-volume
              mountPath: /dev/infiniband/
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
        - name: infiniband-volume
          hostPath:
            path: /dev/infiniband/

Why do you want this feature?

The eksctl installed device plugin is outdated and cannot be deployed to a certain instance type that already supports EFA (e.g. p5.48xlarge, trn2.48xlarge).

There has been past PRs/issues asking to update the yaml with more supported instances:

I do see that there were past changes that got reverted - 2f12605. I am fine with just updating the p5 series instances to begin with.

@ytsssun ytsssun added the kind/feature New feature or request label Dec 13, 2024
Copy link
Contributor

Hello ytsssun 👋 Thank you for opening an issue in eksctl project. The team will review the issue and aim to respond within 1-5 business days. Meanwhile, please read about the Contribution and Code of Conduct guidelines here. You can find out more information about eksctl on our website

@ytsssun ytsssun changed the title [Feature] Add more supported instances to EFA Device Plugin DaemonSet - refer to eks-charts [Feature] Add more supported instances (P5) to EFA Device Plugin DaemonSet - refer to eks-charts Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant