Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tainted nodes are bidding #253

Open
88plug opened this issue Sep 18, 2024 · 1 comment
Open

Tainted nodes are bidding #253

88plug opened this issue Sep 18, 2024 · 1 comment
Assignees
Labels
P1 repo/provider Akash provider-services repo issues

Comments

@88plug
Copy link

88plug commented Sep 18, 2024

Describe the bug
Some providers have nodes that are tainted in their cluster and the Akash provider is still using them to bid when a cluster is at capacity.

[Warning] [FailedScheduling] [Pod] 0/6 nodes are available: 1 Insufficient memory, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 Insufficient cpu. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 5 No preemption victims found for incoming pod..
[FailedScheduling] [Pod] 0/4 nodes are available: 1 node(s) had untolerated taint {CriticalAddonsOnly: true}, 3 Insufficient cpu. preemption: 0/4 nodes are available: 1 Preemption is not helpful for scheduling, 3 No preemption victims found for incoming pod.

I have found at least two styles of taint that are not being respected :

  1. {node-role.kubernetes.io/control-plane: },
  2. {CriticalAddonsOnly: true}

This bug causes causes the deployment never to deploy - and within 5 minute it's closed automatically by the provider.

To Reproduce
Deploy to fill each node on a provider and you will get a bid from the tainted node.

Observered in the wild on validatornode.com and various other providers.

Expected behavior
Providers who have nodes that are tainted for no-deploy/noscheduling should not bid on workloads.

Additional context
Discussed on Sep 18th support call for more detail.

@88plug
Copy link
Author

88plug commented Sep 21, 2024

image

Users are reporting this in Discord as well

@chainzero chainzero added repo/provider Akash provider-services repo issues and removed awaiting-triage labels Sep 30, 2024
@chainzero chainzero added the P1 label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 repo/provider Akash provider-services repo issues
Projects
None yet
Development

No branches or pull requests

3 participants