Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error spam on AKS #33697

Merged
merged 3 commits into from
Nov 16, 2022
Merged

Fix error spam on AKS #33697

merged 3 commits into from
Nov 16, 2022

Conversation

rdner
Copy link
Member

@rdner rdner commented Nov 16, 2022

This was happening due to the error level logging when the log path matcher detected a log.file.path that does not start with a standard Docker container log folder /var/lib/docker/containers because AKS dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level in the first place since it just means that the matcher didn't match and it's not an error. But it was mistakenly promoted from the debug level in #16866 most likely because the message started with Error and looked confusing.

This a partial fix to unblock our customers, but we still need to come up with the full AKS/containerd support in a follow up change.

Why is it important?

There is a lot feedback from customers that Elastic Agent becomes unhealthy because of the overwhelming error messages from Filebeat due to the issue.

Checklist

  • My code follows the style guidelines of this project
    - [ ] I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@rdner rdner added bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team backport-v8.5.0 Automated backport with mergify labels Nov 16, 2022
@rdner rdner requested a review from a team as a code owner November 16, 2022 13:51
@rdner rdner self-assigned this Nov 16, 2022
@rdner rdner requested review from faec and leehinman and removed request for a team November 16, 2022 13:51
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Nov 16, 2022
@mergify
Copy link
Contributor

mergify bot commented Nov 16, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-container-id-error-spam upstream/fix-container-id-error-spam
git merge upstream/main
git push upstream fix-container-id-error-spam

@rdner rdner requested a review from cmacknz November 16, 2022 13:53
This was happening due to the error level logging when the log path
matcher detected a `log.file.path` that does not start with a standard
Docker container log folder `/var/lib/docker/containers` because AKS
dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level
in the first place since it just means that the matcher didn't
match and it's not an error. But it was mistakenly promoted from the
debug level in elastic#16866 most likely
because the message started with `Error` and looked confusing.

This a partial fix to unblock our customers, but we still need to come
up with the full AKS/containerd support in a follow up change.
@rdner rdner force-pushed the fix-container-id-error-spam branch from b80faa5 to 18ed701 Compare November 16, 2022 14:06
@elasticmachine
Copy link
Collaborator

elasticmachine commented Nov 16, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-11-16T16:22:10.127+0000

  • Duration: 71 min 22 sec

Test stats 🧪

Test Results
Failed 0
Passed 7083
Skipped 738
Total 7821

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good (subject to the lint errors)

@rdner rdner merged commit 29f0b4c into elastic:main Nov 16, 2022
@rdner rdner deleted the fix-container-id-error-spam branch November 16, 2022 20:13
mergify bot pushed a commit that referenced this pull request Nov 16, 2022
This was happening due to the error level logging when the log path
matcher detected a `log.file.path` that does not start with a standard
Docker container log folder `/var/lib/docker/containers` because AKS
dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level
in the first place since it just means that the matcher didn't
match and it's not an error. But it was mistakenly promoted from the
debug level in #16866 most likely
because the message started with `Error` and looked confusing.

This a partial fix to unblock our customers, but we still need to come
up with the full AKS/containerd support in a follow up change.

(cherry picked from commit 29f0b4c)
@rdner rdner added the backport-v8.6.0 Automated backport with mergify label Nov 16, 2022
mergify bot pushed a commit that referenced this pull request Nov 16, 2022
This was happening due to the error level logging when the log path
matcher detected a `log.file.path` that does not start with a standard
Docker container log folder `/var/lib/docker/containers` because AKS
dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level
in the first place since it just means that the matcher didn't
match and it's not an error. But it was mistakenly promoted from the
debug level in #16866 most likely
because the message started with `Error` and looked confusing.

This a partial fix to unblock our customers, but we still need to come
up with the full AKS/containerd support in a follow up change.

(cherry picked from commit 29f0b4c)
rdner added a commit that referenced this pull request Nov 17, 2022
This was happening due to the error level logging when the log path
matcher detected a `log.file.path` that does not start with a standard
Docker container log folder `/var/lib/docker/containers` because AKS
dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level
in the first place since it just means that the matcher didn't
match and it's not an error. But it was mistakenly promoted from the
debug level in #16866 most likely
because the message started with `Error` and looked confusing.

This a partial fix to unblock our customers, but we still need to come
up with the full AKS/containerd support in a follow up change.

(cherry picked from commit 29f0b4c)

Co-authored-by: Denis <[email protected]>
rdner added a commit that referenced this pull request Nov 17, 2022
This was happening due to the error level logging when the log path
matcher detected a `log.file.path` that does not start with a standard
Docker container log folder `/var/lib/docker/containers` because AKS
dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level
in the first place since it just means that the matcher didn't
match and it's not an error. But it was mistakenly promoted from the
debug level in #16866 most likely
because the message started with `Error` and looked confusing.

This a partial fix to unblock our customers, but we still need to come
up with the full AKS/containerd support in a follow up change.

(cherry picked from commit 29f0b4c)

Co-authored-by: Denis <[email protected]>
@rodrigc
Copy link

rodrigc commented Jan 30, 2023

@rdner I think you can repro this in any kubernetes setup (not just AKS) which uses CRI-O
instead of docker. For example, Red Hat's Openshift.

If you do:

kubectl get nodes -o wide

if you see something like:

NAME          STATUS   ROLES   AGE   VERSION   OS-IMAGE                  KERNEL-VERSION                      CONTAINER-RUNTIME
host1   Ready    node    19d   v1.24.1      Oracle Linux Server 8.6   5.4.17-2136.314.6.2.el8uek.x86_64   cri-o://1.24.1-76.el8
host2   Ready    node    19d   v1.24.1      Oracle Linux Server 8.6   5.4.17-2136.314.6.2.el8uek.x86_64   cri-o://1.24.1-76.el8
host3   Ready    node    19d   v1.24.1      Oracle Linux Server 8.6   5.4.17-2136.314.6.2.el8uek.x86_64   cri-o://1.24.1-76.el8

where the latest column says cri-o instead of docker, then you will hit this problem

chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
This was happening due to the error level logging when the log path
matcher detected a `log.file.path` that does not start with a standard
Docker container log folder `/var/lib/docker/containers` because AKS
dropped support for Docker in September 2022 and switched to containerd.

It looks like this message was not supposed to be on the error level
in the first place since it just means that the matcher didn't
match and it's not an error. But it was mistakenly promoted from the
debug level in #16866 most likely
because the message started with `Error` and looked confusing.

This a partial fix to unblock our customers, but we still need to come
up with the full AKS/containerd support in a follow up change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.5.0 Automated backport with mergify backport-v8.6.0 Automated backport with mergify bug Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using Elastic Agent with the k8s integration on AKS causes error spam
4 participants