Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: fix nw_service not available error #1556

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

atline
Copy link
Contributor

@atline atline commented Dec 9, 2024

There is race condition between container start and ssh-server service ready in container, which may possible make nw_service not available.

Make nw_service for docker not static allows target to wait for the available of nw_service, rather than raise an exception.

Description

Run the command pytest -s --lg-env env.yaml test_shell.py, there is a low probability next:

test_shell.py E

======================================== ERRORS =========================================
_____________________________ ERROR at setup of test_shell ______________________________

target = Target(name='main', env=Environment(config_file='env.yaml'))

    @pytest.fixture(scope="session")
    def command(target):
        strategy = target.get_driver("DockerStrategy")
        strategy.transition("accessible")
>       shell = target.get_driver("CommandProtocol")

conftest.py:8:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../labgrid-venv/lib/python3.8/site-packages/labgrid/target.py:234: in get_driver
    return self._get_driver(cls, name=name, resource=resource, activate=activate)
../../../labgrid-venv/lib/python3.8/site-packages/labgrid/target.py:208: in _get_driver
    self.activate(found[0])
../../../labgrid-venv/lib/python3.8/site-packages/labgrid/target.py:462: in activate
    self.await_resources(resources)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = Target(name='main', env=Environment(config_file='env.yaml'))
resources = [NetworkService(target=Target(name='main', env=Environment(config_file='env.yaml')), name='NetworkService', state=<BindingState.bound: 1>, avail=False, address='172.17.0.2', username='root', password='root', port=22)]
timeout = None, avail = True

    def await_resources(self, resources, timeout=None, avail=True):
        """
        Poll the given resources and wait until they are (un-)available.

        Args:
            resources (List): the resources to poll
            timeout (float): optional timeout
            avail (bool): optionally wait until the resources are unavailable with avail=False
        """
        self.update_resources()

        waiting = set(r for r in resources if r.avail != avail)
        static = set(r for r in waiting if r.get_managed_parent() is None)
        if static:
>           raise NoResourceFoundError(
                f"Static resources are not {'available' if avail else 'unavailable'}: {static}"
            )
E           labgrid.exceptions.NoResourceFoundError: Static resources are not available: {NetworkService(target=Target(name='main', env=Environment(config_file='env.yaml')), name='NetworkService', state=<BindingState.bound: 1>, avail=False, address='172.17.0.2', username='root', password='root', port=22)}

../../../labgrid-venv/lib/python3.8/site-packages/labgrid/target.py:79: NoResourceFoundError
================================ short test summary info ================================
ERROR test_shell.py::test_shell - labgrid.exceptions.NoResourceFoundError: Static resources are not available: {Networ...
=================================== 1 error in 6.98s ====================================

This is because when container start, the ssh-server maybe not ready very quickly, which makes _socket_connect failure, then nw_service.avail = False.

This PR assigned a parent to nw_service which makes it no longer static, then target wait until they are available.

  • [√] PR has been tested

After this patch, and if explicitly print exception, it will show next:

(labgrid-venv) pie@pie:~/labgrid/examples/docker$ pytest -s --lg-env env.yaml test_shell.py
========================================================================= test session starts ==========================================================================
platform linux -- Python 3.8.10, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/pie/labgrid/examples
configfile: pytest.ini
plugins: labgrid-24.1.dev130
collected 1 item

test_shell.py Traceback (most recent call last):
  File "/home/pie/labgrid-venv/lib/python3.8/site-packages/labgrid/resource/docker.py", line 154, in _socket_connect
    s = socket.create_connection((address, port))
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/home/pie/labgrid-venv/lib/python3.8/site-packages/labgrid/resource/docker.py", line 154, in _socket_connect
    s = socket.create_connection((address, port))
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
Traceback (most recent call last):
  File "/home/pie/labgrid-venv/lib/python3.8/site-packages/labgrid/resource/docker.py", line 154, in _socket_connect
    s = socket.create_connection((address, port))
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
.

========================================================================== 1 passed in 3.42s ===========================================================================
(labgrid-venv) pie@pie:~/labgrid/examples/docker$

You can see even Connection refused, finally after ssh-server ready, the test pass.

There is race condition between container start and ssh-server service ready in container,
which may possible make `nw_service` not available.

Make `nw_service` for docker not static allows `target` to wait for the available
of `nw_service`, rather than raise an exception.

Signed-off-by: Larry Shen <[email protected]>
Copy link

codecov bot commented Dec 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.1%. Comparing base (d98677c) to head (95a8a91).

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@          Coverage Diff           @@
##           master   #1556   +/-   ##
======================================
  Coverage    56.1%   56.1%           
======================================
  Files         170     170           
  Lines       13225   13226    +1     
======================================
+ Hits         7432    7433    +1     
  Misses       5793    5793           
Flag Coverage Δ
3.10 56.1% <100.0%> (+<0.1%) ⬆️
3.11 56.1% <100.0%> (+<0.1%) ⬆️
3.12 56.1% <100.0%> (+<0.1%) ⬆️
3.13 56.1% <100.0%> (+<0.1%) ⬆️
3.9 56.2% <100.0%> (+<0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant