Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ocp] Fix backtrace in ocp collector #3515

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jcastill
Copy link
Member

@jcastill jcastill commented Feb 9, 2024

While runnng sos collector in an ocp cluster
we got the following trace:

Traceback (most recent call last):
File "/usr/sbin/sos", line 22, in
sos.execute()
File "/usr/lib/python3.6/site-packages/sos/init.py",
line 193, in execute
self._component.execute()
File "/usr/lib/python3.6/site-packages/sos/collector/init.py",
line 1175, in execute
self.prep()
File "/usr/lib/python3.6/site-packages/sos/collector/init.py",
line 896, in prep
self.cluster.setup()
File "/usr/lib/python3.6/site-packages/sos/collector/clusters/ocp.py",
line 132, in setup
out = self.exec_primary_cmd(self.fmt_oc_cmd("auth can-i '' ''"))
File "/usr/lib/python3.6/site-packages/sos/collector/clusters/ocp.py",
line 104, in fmt_oc_cmd
return "%s %s" % (self.oc_cmd, cmd)
File "/usr/lib/python3.6/site-packages/sos/collector/clusters/ocp.py",
line 81, in oc_cmd
'which oc', chroot=self.primary.host.sysroot
TypeError: run_command() got an unexpected keyword argument 'chroot'

This patch changes the unused option 'chroot' that was undefined in run_command()
to the option need_root that was in use in other commands in ocp.py.

Related: RH: RHEL-24351


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?

Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3515
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@arif-ali arif-ali mentioned this pull request Feb 9, 2024
5 tasks
@jcastill jcastill force-pushed the jcastillo-fix-ocp-trace branch from 9b5969c to a11cba6 Compare February 9, 2024 11:19
@alosadagrande
Copy link

verified the fix in my lab environment, lgtm

'which oc', chroot=self.primary.host.sysroot
'which oc', need_root=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very much not the same thing, and this is not the correct fix.

chroot means "run this command in a chroot, so that we aren't in '/'". The conditional before this makes it obvious that we're trying to make sure that we run in the host's mounted filesystem, not inside the container.

need_root simply makes sure we're executing as the root user, or otherwise we use `sudo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not the same thing, but run_command() doesn't provide a 'chroot' argument. It does provide a 'need_root' argument, and it's used in other places in ocp.py, like in line #349. Should we instead rewrite run_command() to accept a chroot argument?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should be the right way of detecting oc binary inside a container?

Should we search for the binary on the host (that sounds meaningful to me, as oc is the tool to manage containers, so it should run on host and not necessarily in container, am I right?)?

Or should we search for the binary under root (that does work in practise)?

When I am in a container, what is the correct way of detecting path to (the right?) oc binary?

If we dont know 100% answer to the latest question, I lean towards the current fix - which fixes missing parameter of a method (that never works) by something that works (at least in the tested use cases).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct way is to look in the mounted root filesystem inside the container, which would be passed by the HOST or similar env var and be known to sos. This is what the chroot is intended for.

The current change does not actually fix that binary discovery. It simply runs which oc. The needs_root here is meaningless as we'd already be root in the container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I'll remove that part and resend the PR with only the fix that solves the backtrace.

While runnig sos collector in an ocp cluster
we got the following trace:

Traceback (most recent call last):
  File "/usr/sbin/sos", line 22, in <module>
    sos.execute()
  File "/usr/lib/python3.6/site-packages/sos/__init__.py",
        line 193, in execute
    self._component.execute()
  File "/usr/lib/python3.6/site-packages/sos/collector/__init__.py",
        line 1175, in execute
    self.prep()
  File "/usr/lib/python3.6/site-packages/sos/collector/__init__.py",
        line 896, in prep
    self.cluster.setup()
  File "/usr/lib/python3.6/site-packages/sos/collector/clusters/ocp.py",
        line 132, in setup
    out = self.exec_primary_cmd(self.fmt_oc_cmd("auth can-i '*' '*'"))
  File "/usr/lib/python3.6/site-packages/sos/collector/clusters/ocp.py",
        line 104, in fmt_oc_cmd
    return "%s %s" % (self.oc_cmd, cmd)
  File "/usr/lib/python3.6/site-packages/sos/collector/clusters/ocp.py",
        line 81, in oc_cmd
    'which oc', chroot=self.primary.host.sysroot
TypeError: run_command() got an unexpected keyword argument 'chroot'

This patch changes the unused option 'chroot' that was undefined in
run_command() to the option need_root that was in use in other commands
in ocp.py.

Related: RH: RHEL-24351

Co-authored-by: Alberto Losada Grande <[email protected]>

Signed-off-by: Jose Castillo <[email protected]>
@jcastill jcastill force-pushed the jcastillo-fix-ocp-trace branch from a11cba6 to fc8da6a Compare March 11, 2024 10:03
@jcastill
Copy link
Member Author

I removed need_root=True from the commit, and the fix is now just removing the option that triggered the call trace.

@pmoravec
Copy link
Contributor

Is propr oc detected, then? Could @alosadagrande test it, please? The chroot "scope" of the oc binary search is really not needed?

@jcastill
Copy link
Member Author

Is propr oc detected, then? Could @alosadagrande test it, please? The chroot "scope" of the oc binary search is really not needed?

The default of 'chroot' is False iIrc, so Alberto's test should tell us more for sure

@pmoravec
Copy link
Contributor

pmoravec commented Apr 8, 2024

Is propr oc detected, then? Could @alosadagrande test it, please? The chroot "scope" of the oc binary search is really not needed?

The default of 'chroot' is False iIrc, so Alberto's test should tell us more for sure

Kindly requesting @alosadagrande for a test..

@alosadagrande
Copy link

I do not have the environment anymore. I'll try to ask for a new one, but it can take a couple of weeks

@arif-ali
Copy link
Member

@alosadagrande @jcastill any update on this one

@jcastill
Copy link
Member Author

The only blocker in this one is to test the latest fix. I'll try to set up an environment in the lab and check it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants