[ws-scheduler] Take pod resource into account #4700

csweichel · 2021-07-05T07:53:41Z

In core-dev we often see OutOfpods: Pod Node didn't have enough resource: pods, requested: 1, used: 110, capacity: 110. ws-scheduler should take pods resource into account.

The text was updated successfully, but these errors were encountered:

geropl · 2021-07-06T09:18:31Z

ws-scheduler should take this into account indeed.

Note, however, the issue with core-dev stems from our not-well-managed dev cluster, and we run into this issue with static deployment, too.

csweichel · 2021-07-08T07:46:12Z

Good point. For core-dev we could change the CIDR range as you suggested

meysholdt · 2021-07-08T10:00:28Z

I looked into increasing the max pod numbers by using larger CIDR ranges, but no luck:

gcloud container clusters create cluster220-1 \
  --project me-cidr \
  --enable-ip-alias \
  --cluster-ipv4-cidr 10.0.0.0/21 \
  --services-ipv4-cidr 10.4.0.0/19 \
  --create-subnetwork name='my-subnet1',range=10.4.32.0/23 \
  --default-max-pods-per-node 220 \
  --zone europe-west1

returns

ERROR: (gcloud.container.clusters.create) ResponseError: code=400, message=Maximum pods per node must be at least 8 and at most 110, current number is 220.

And the docs say

Although having 110 Pods per node is a hard limit, you can reduce the number of Pods on a node

meysholdt · 2021-07-08T10:13:03Z

Let's do the math how many preview envs we can have once #4744 has been merged:

On the core-dev clusters, there are

8 DaemonSets in kube-system
1 DaemonSet in cloud-monitoring: node-exporter
1 DaemonSet in docker: docker-engine
3 DaemonSets per preview-env: agent-smith, registry-facade, ws-daemon.

110 pods max per node minus 10 (kube-system, docker, monitoring) leaves 100 for preview-envs.

While 100 slots should in theory be enough room for 33 preview envs, in practice there will probably be trouble earlier because already-running pods won't re-locate to other nodes to make room for DaemonSets.

csweichel · 2021-07-08T10:15:53Z

That is a pity - thank you for checking this one though :)

Looks like sooner or later we'll have to move to more isolated preview environments.
Let's hope that the ws-scheduler fix brings some help.

meysholdt · 2021-07-08T10:28:10Z

Maybe pod-(anti)-affinities can be used to ensure every preview env (kinda) gets their own node:

no two ws-managers may run on the same node (anti-affinity)
agent-smith, registry-facade, and ws-daemon may only run on the node on which their ws-manager is running (affinity)
ws-scheduler would need to be limited to launch workspaces only on those nodes. (ws-scheduler-enhancement).

geropl · 2021-07-08T12:53:34Z

in practice there will probably be trouble earlier because already-running pods won't re-locate to other nodes to make room for DaemonSets.

True. We can minimize the chance for this by:

increasing requests for static deployments (except DaemonSets)
reduce/remove requests for DaemonSets

jankeromnes · 2021-08-24T15:21:52Z

Side note: DaemonSets can have node selectors to restrict them to some nodes.

This could allow limiting DaemonSets to only some nodes (for example, only 50% of the nodes, or 1 node for each deployment) thus avoiding the quadratic growth trajectory.

csweichel added type: bug Something isn't working component: ws-scheduler priority: high (dev loop impact) Gitpod development loop impacting issues labels Jul 5, 2021

csweichel self-assigned this Jul 8, 2021

csweichel mentioned this issue Jul 8, 2021

[ws-scheduler] Respect pod slots when scheduling #4744

Merged

csweichel mentioned this issue Jul 8, 2021

[preview] Decrease preview env density #4750

Merged

csweichel closed this as completed in #4744 Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ws-scheduler] Take pod resource into account #4700

[ws-scheduler] Take pod resource into account #4700

csweichel commented Jul 5, 2021

geropl commented Jul 6, 2021

csweichel commented Jul 8, 2021

meysholdt commented Jul 8, 2021

meysholdt commented Jul 8, 2021

csweichel commented Jul 8, 2021

meysholdt commented Jul 8, 2021

geropl commented Jul 8, 2021

jankeromnes commented Aug 24, 2021 •

edited

Loading

[ws-scheduler] Take pod resource into account #4700

[ws-scheduler] Take pod resource into account #4700

Comments

csweichel commented Jul 5, 2021

geropl commented Jul 6, 2021

csweichel commented Jul 8, 2021

meysholdt commented Jul 8, 2021

meysholdt commented Jul 8, 2021

csweichel commented Jul 8, 2021

meysholdt commented Jul 8, 2021

geropl commented Jul 8, 2021

jankeromnes commented Aug 24, 2021 • edited Loading

jankeromnes commented Aug 24, 2021 •

edited

Loading