Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU requests and limits for build pod #1348

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

noroutine
Copy link

@noroutine noroutine commented Aug 24, 2021

This PR makes CPU request and limit for build pod to be configurable in config, just like memory settings
Trait was implemented to be flexible in taken values and accepts integer, floats, string representations of those, as well as millicore spec as per https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-cpu

@welcome
Copy link

welcome bot commented Aug 24, 2021

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@betatim
Copy link
Member

betatim commented Aug 25, 2021

Thanks for this PR! I was surprised to see that we don't already have this, which means this is for sure useful :D

Looking and thinking about the code now.

Some immediate thoughts:

  1. will this work when BinderHub is not running on kubernetes? Recently @manics and/or @yuvipanda did some work on making that possible. Would be good to know how this interacts. If it would require extra work it doesn't need to be addressed here (can be a new PR) xref Allow running dev setup without kubernetes #1323
  2. do we need to have a new traitlet type? The reason I'm asking is because that adds a lot of code for which we'd need some tests, etc. We did the same for the memory spec, so we probably need it. But who knows maybe we got smarter since then and now no longer need it?
  3. there are no tests :( can we copy from/be inspired by the memory spec test suite?

@manics
Copy link
Member

manics commented Aug 25, 2021

I think it'll be fine for non-k8s systems, we can ignore the limits 😄.

However if we want to make it easier to implement limits in non-k8s backends it might be nice to tighten the specification.
e.g. CPU is currently "Can be an integer (1), fraction (0.5) or millicore specification (100m)", whereas we could choose to only allow only of those (either 0-1 or millicore).

@noroutine
Copy link
Author

noroutine commented Aug 25, 2021

@betatim, @manics, thanks for feedback

Tbh I was confused about units in JH/BH, seems like different approach from k8s is used (M, G, K instead of Mi, Gi, Ki), so I was not sure what is "right" here. Due to time pressure, just ended with k8s convention, I'm open to adjust. On one hand currently it feels unnatural to use convention different from k8s, otoh in this case probably is bad idea to use convention that differs from rest of JH ecosystem.

My honest feeling is the whole idea of having these traits for extra validation/translation for different convention breaks least surprise principle, and is if not already a source of some confusion.

If i'm an operator and deploy it to k8s, expectation is that requests and limits follow k8s conventions and be passed verbatim as specified in config to the pod spec, rather than being transformed via traits to some magic unexpected common denominators. Applies to JH as well.

As for tests, I'd be more than glad to write some, as this PR is driven by some internal requirements and we're interested in merging it to upstream to avoid the cost of maintaining fork. Will raise this tomorrow internally.

@manics
Copy link
Member

manics commented Aug 25, 2021

@noroutine I understand your point! There's no single right answer, some people expect things to follow a standard format that is independent of the underlying implementation, some expect it to follow a low level standard (K8s spec), others expect it to follow a different low-level standard (e.g. systemd uses percentages).

I think the decision should come down to whether we'd like to support limits in future alternative backends. If we decide only K8s matters then using the K8s format is fine, otherwise we should consider how other developers could implement it.

Regardless of the format I think it'd be nice to include a description of it in the help as well as a URL. That guarantees it's always known, even if the URL changes in future.

@betatim
Copy link
Member

betatim commented Aug 26, 2021

The BytesSpecification comes from JupyterHub which runs on many platforms, not just kubernetes. I think that helps explain why it isn't "just like k8s". BinderHub re-used it probably because it was easy to reuse it (even though back in the day there was no plan for BH to ever work without k8s.) and because it is consistent within the Jupyter(Hub) eco-system.

For CPU limits: the ones implemented in this PR follow the format kubernetes uses as well as what the docker CLI "natively" understands. This is also the format repo2docker uses.

So unless someone has strong opinions on an alternative format I'd say we are consistent within our ecosystem and with kubernetes. So lets stick with this for now?

It would be great to have some tests, even just simple ones that check that the basics aren't broken. Without this something will sooner or later break this feature and no one will notice.

@noroutine
Copy link
Author

Hey guys

I've just pushed some tests to cover that trait, let me know if i can do any more from my side to get this through!

Kind regards

Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice!

@consideRatio
Copy link
Member

@manics @betatim tests has been added and the code looks good to me.

I also think that the config is reasonable in a non-k8s context, it could be relevant to have such requests/limits there as well even though maybe it doesn't - and then it can be documented they will be ignored for such environments later.

Go for merge?

binderhub/utils.py Outdated Show resolved Hide resolved
@betatim
Copy link
Member

betatim commented Sep 24, 2021

Not sure why the tests are failing :(

@manics
Copy link
Member

manics commented Sep 24, 2021

If you go to the logs for the failing check: https://github.com/jupyterhub/binderhub/pull/1348/checks?check_run_id=3697325231
and expand Show hub logs you'll see the Hub errors:

[E 210924 07:41:15 web:1789] Uncaught exception GET /services/binder/build/gh/binderhub-ci-repos/cached-minimal-dockerfile/HEAD (::1)
    HTTPServerRequest(protocol='http', host='localhost:8000', method='GET', uri='/services/binder/build/gh/binderhub-ci-repos/cached-minimal-dockerfile/HEAD', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/tornado/web.py", line 1704, in _execute
        result = await result
      File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/binderhub/builder.py", line 402, in get
        self.build = build = BuildClass(
    TypeError: __init__() got an unexpected keyword argument 'cpu_limit'

Work on #1318 went a lot faster than expected! The new parameter needs to be added to

class LocalRepo2dockerBuild(Build):
"""Represents a build of a git repository into a docker image.
This runs a build using the repo2docker command line tool.
WARNING: This is still under development. Breaking changes may be made at any time.
"""
def __init__(
self,
q,
api,
name,
*,
namespace,
repo_url,
ref,
build_image,
docker_host,
image_name,
git_credentials=None,
push_secret=None,
memory_limit=0,
memory_request=0,
node_selector=None,
appendix="",
log_tail_lines=100,
sticky_builds=False,
):

The alternative is to add this as a configurable Trailets on the Build class when it's refactored as discussed in #1318 (comment). That way you can avoid adding a parameter to BinderHub, the configurable should be passed down to the class, and since it can be a configurable on e.g. KubernetesBuilder it can use Kubernetes specific config. The downside is that'll delay the addition of this new feature.

However there's another issue. The build pod runs repo2docker, which does the build in another Docker container (by communicating directly with the Docker daemon through a socket). Since the Docker build takes place outside K8S the K8S pod limits won't apply, they somehow need to be passed to repo2docker.

Based on this comment:

memory_request
Memory request of the build pod. The actual building happens in the
docker daemon, but setting request in the build pod makes sure that
memory is reserved for the docker build in the node by the kubernetes
scheduler.

a memory request is useful because it guarantees a certain amount of memory for the build pod. If I understand correctly the build pod won't use that memory so most of it will be free on the K8S node, which means the Docker container where the build occurs can use it. memory_limit has an effect because it's passed down to repo2docker, which then sets the memory limit using the Docker build API:
if self.memory_limit:
r2d_options.append('--build-memory-limit')
r2d_options.append(str(self.memory_limit))

https://github.com/jupyterhub/repo2docker/blob/3eef69f7454ddb9086af38b2ece5f632af05920f/repo2docker/app.py#L161-L169

This means cpu_request will effectively reserve some CPU for the build, but cpu_limit will have no effect unless it's also added to repo2docker.

@noroutine
Copy link
Author

Hi gentlemen

Unfortunate it failed again.

I was a bit out on leave, and checking now above, trying to understand what best action on making this work again.
Will dive into this again this week, but have to understand what shall I do regarding repo2docker code

Thank you

@manics
Copy link
Member

manics commented Nov 3, 2021

@noroutine You'll need to add cpu limit support to repo2docker, then you can come back here and pass the appropriate flags when repo2docker is run.

@noroutine
Copy link
Author

Thank you @manics

@manics manics marked this pull request as draft February 12, 2022 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants