-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inconsistent Prometheus cAdvisor metrics #51473
Fix inconsistent Prometheus cAdvisor metrics #51473
Conversation
Prometheus requires that all metrics in the same family have the same labels, so we arrange to supply blank strings for missing labels See google/cadvisor#1704
Hi @bboreham. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/ok-to-test |
/test pull-kubernetes-federation-e2e-gce |
This change looks good, although it doesnt appear to fix all issues. At least for me, it looks like diskio metrics are still not consistent. I'll have to look into that... |
/assign @Random-Liu |
/test pull-kubernetes-federation-e2e-gce |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Random-Liu, bboreham, dashpole Associated issue: 50151 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest Review the full test history for this PR. |
1 similar comment
/retest Review the full test history for this PR. |
Automatic merge from submit-queue (batch tested with PRs 51471, 50561, 50435, 51473, 51436) |
This needs to be cherry picked into release-1.7 before the next release. |
@Random-Liu or anyone could launch the release-1.7 cherry-pick process ? |
@sylr if you want to cherrypick this; just propose doing so.
(also see the doc: https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md) No need to wait for others to do this. |
Removing label |
@luxas Thanks Note to self:
|
yeah, if that are the git remote names you're using |
@luxas I think you need to add a release milestone to get this cherry-pick go on |
@AndreaGiardini AFAIK, that's not totally mandatory, but makes it much more clear and the normal flow, so will do. Thanks! |
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
What this PR does / why we need it:
We need this because otherwise kubelet is exposing different sets of Prometheus metrics that randomly include or do not include containers.
See also google/cadvisor#1704; quoting here:
Prometheus requires that all metrics in the same family have the same labels, so we arrange to supply blank strings for missing labels
The function
containerPrometheusLabels()
conditionally adds various metric labels from container labels - pod name, image, etc. However, when it receives the metrics, Prometheus checks that all metrics in the same family have the same label set, and rejects those that do not.Since containers are collected in (somewhat) random order, depending on which kind is seen first you get one set of metrics or the other.
Changing the container labels function to always add the same set of labels, adding
""
when it doesn't have a real value, eliminates the issue in my testing.Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #Fixes #50151
Special notes for your reviewer:
I have made the same fix in two places. I am 98% sure the one in
cadvisor_linux.go
isn't used and indeed cannot be used, but have not gone fully down that rabbit-hole.Release note: