Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add container based scaling to HPA #90691

Merged
merged 1 commit into from
Oct 23, 2020

Conversation

arjunrn
Copy link
Contributor

@arjunrn arjunrn commented May 2, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR introduces a new metric source called ContainerResourceMetricSource and a corresponding ContainerResourceMetricStatus types which are used to specify metric targets for individual containers in the target pods and the status of those metrics.

Which issue(s) this PR fixes:

Fixes #86349

Special notes for your reviewer:

/label api-review

Does this PR introduce a user-facing change?:

Introduces a metric source for HPAs which allows scaling based on container resource usage.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-autoscaling/0001-container-resource-autoscaling.md:
- [Usage]: https://github.com/kubernetes/website/pull/23523

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels May 2, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @arjunrn. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/kubectl kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 2, 2020
@arjunrn
Copy link
Contributor Author

arjunrn commented May 2, 2020

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label May 2, 2020
@arjunrn arjunrn force-pushed the container-resource-hpa branch from bd6fa22 to e6a2ed7 Compare May 6, 2020 10:43
@arjunrn
Copy link
Contributor Author

arjunrn commented May 20, 2020

/assign @josephburnett

@liggitt liggitt removed the api-review Categorizes an issue or PR as actively needing an API review. label May 21, 2020
@liggitt
Copy link
Member

liggitt commented May 21, 2020

Thanks for the PR. It looks like the linked proposal is not yet marked as implementable and is missing test plan info. Unflagging for API review.

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@arjunrn
Copy link
Contributor Author

arjunrn commented Jun 3, 2020

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label Jun 3, 2020
@arjunrn arjunrn force-pushed the container-resource-hpa branch 2 times, most recently from edc135a to 1051fbe Compare September 13, 2020 14:33
@pwittrock
Copy link
Member

@arjunrn please split out the cli changes into a separate PR, or at least commit.

@arjunrn arjunrn force-pushed the container-resource-hpa branch from 1051fbe to 36c04e9 Compare September 16, 2020 08:05
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sig-cli related changes lgtm

Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment on API docs, one question about the containerResource name validation, rebase, then lgtm

if len(src.Name) == 0 {
allErrs = append(allErrs, field.Required(fldPath.Child("name"), "must specify a resource name"))
} else {
if !helper.IsStandardContainerResourceName(string(src.Name)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intending to allow all resources a container could request/limit? if so, this doesn't allow as much as validateContainerResourceName. is that ok?

if this intending to allow only cpu/memory? if so, this also allows ephemeral-storage and hugepage resources. is that ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the resource field for the regular resource type does not have any validation in it. So users can specify even ephemeral-storage and hugepage- resources for scaling. However metrics-server which is used by the HPA controller does not support those resources at the moment. But if they were, the HPA controller would support them out of the box because the resource name is not validated. So should I now restrict which resources can be specified(limit to cpu and memory) or should I be a bit more permissive and allow all container resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@josephburnett josephburnett Oct 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replica_calculator.go is pretty generic all the way through. So we should allow any valid resource.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josephburnett @liggitt In that case the current validation is correct. So I will leave this as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the current validation is correct

are you sure? this validation is narrower than validateContainerResourceName, so it would disallow resources a container could request, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood your first comment. You're right this is more restrictive. So now I have 3 options:

  1. Copy the validateContainerResourceName function.
  2. Turn the validateContainerResourceName into a public function and add tests.
  3. Remove the validation to keep it the same as validation for the resource name in the ResourceMetricSource.

I prefer option 2. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 sounds good to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt I've made changes. Can you take another look.

@arjunrn arjunrn force-pushed the container-resource-hpa branch 2 times, most recently from e1e437c to 1b730d9 Compare October 21, 2020 09:29
@liggitt
Copy link
Member

liggitt commented Oct 21, 2020

API doc updates lgtm. I'll defer to @josephburnett on the desired allowed values for the container resource. Once that is determined, add validation unit tests demonstrating the allowed/disallowed values.

@arjunrn arjunrn force-pushed the container-resource-hpa branch from 1b730d9 to 1b3d3a3 Compare October 21, 2020 17:50
@arjunrn arjunrn force-pushed the container-resource-hpa branch from 1b3d3a3 to 0fec7b0 Compare October 21, 2020 19:17
@liggitt
Copy link
Member

liggitt commented Oct 22, 2020

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 22, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: arjunrn, josephburnett, liggitt, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2020
@liggitt
Copy link
Member

liggitt commented Oct 22, 2020

thanks, changes look good

@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@arjunrn
Copy link
Contributor Author

arjunrn commented Oct 23, 2020

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubectl cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: API review completed, 1.20
Development

Successfully merging this pull request may close these issues.

HPA should consider individual containers when scaling based on resource metrics
8 participants