Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[systemd][teamsyncd] missing logs during restarting system logging service #17792

Open
ayurkiv-nvda opened this issue Jan 16, 2024 · 1 comment · May be fixed by #18113
Open

[systemd][teamsyncd] missing logs during restarting system logging service #17792

ayurkiv-nvda opened this issue Jan 16, 2024 · 1 comment · May be fixed by #18113
Assignees
Labels
Issue for 202311 MSFT Triaged this issue has been triaged

Comments

@ayurkiv-nvda
Copy link
Contributor

ayurkiv-nvda commented Jan 16, 2024

Description

Sometimes some logs may be lost due to syslog restart

Steps to reproduce the issue:

  1. run "fast-reboot" (most probably problem is generic, we just manged to catch it during advanced-reboot test (fast-reboot) because this test is looking for specific logs in syslog)
  2. check logs

NOTE:
From functional point of view everything is ok, but fast-reboot test is looking for some logs in syslog:
try_add_lag: The LAG 'PortChannel104' has been added.

Describe the results you received:

Jan 12 07:02:12.808502 arc-switch1004 NOTICE teamd#teamsyncd: :- applyState: Applying state
Jan 12 07:02:12.808502 arc-switch1004 NOTICE teamd#teamsyncd: :- dump: getting took 0.001219 sec
Jan 12 07:02:12.798364 arc-switch1004 INFO systemd[1]: Stopping System Logging Service...
Jan 12 07:02:12.809352 arc-switch1004 INFO rsyslogd: [origin software="rsyslogd" swVersion="8.2302.0" x-pid="385" x-info="https://www.rsyslog.com"] exiting on signal 15.
Jan 12 07:02:13.799523 arc-switch1004 DEBUG container: container_wait: BEGIN

According to logs, it looks like tlm_teamd started and all is up and running. "INFO systemd[1]: Stopping System Logging Service message" means that syslog is restarting in parallel and at this moment we can lose some messages.
Syslog restart is expected flow on boot because config is changed in runtime (and we need to restart it in order to apply changes).

Describe the results you expected:

Jan 16 00:25:01.751031 r-tigon-04 NOTICE teamd#teamsyncd: :- applyState: Applying state
Jan 16 00:25:01.751943 r-tigon-04 NOTICE teamd#teamsyncd: :- dump: getting took 0.000604 sec
Jan 16 00:25:01.752638 r-tigon-04 NOTICE teamd#teamsyncd: :- dump: getting took 0.000301 sec
Jan 16 00:25:01.753146 r-tigon-04 NOTICE teamd#teamsyncd: :- setWarmStartState: teamsyncd warm start state changed to reconciled
Jan 16 00:25:01.753475 r-tigon-04 NOTICE teamd#tlm_teamd: :- try_add_lag: The LAG 'PortChannel104' has been added.
Jan 16 00:25:01.754515 r-tigon-04 NOTICE teamd#tlm_teamd: :- try_add_lag: The LAG 'PortChannel103' has been added.
Jan 16 00:25:01.754962 r-tigon-04 NOTICE teamd#tlm_teamd: :- try_add_lag: The LAG 'PortChannel102' has been added.
Jan 16 00:25:01.755390 r-tigon-04 NOTICE teamd#tlm_teamd: :- try_add_lag: The LAG 'PortChannel101' has been added

Output of show version:

build_version: '202305_RC.70-d572f3d55_Internal'
debian_version: '11.8'
kernel_version: '5.10.0-23-2-amd64'
asic_type: mellanox
asic_subtype: 'mellanox'
commit_id: 'd572f3d55'
branch: '202305_RC'
release: '202305'
build_date: Thu Jan 11 21:57:31 UTC 2024
build_number: 70
built_by: sw-r2d2-bot@r-build-sonic-ci02-241
libswsscommon: 1.0.0
sonic_utilities: 1.2
sonic_os_version: 11

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@ayurkiv-nvda ayurkiv-nvda changed the title [systemd][teamsyncd] missing logs after stopping system logging service [systemd][teamsyncd] missing logs during restarting system logging service Jan 16, 2024
@saiarcot895
Copy link
Contributor

saiarcot895 commented Jan 20, 2024

Because the syslogs from containers are sent over UDP to the host, this means that if nothing is listening on UDP port 514 on the host (because rsyslogd on the host is restarting), then messages from containers will get lost. This can happen with any container and any set of messages; it's just that a syslog that we were explicitly looking for as part of warm/fast reboot got dropped here.

The simplest way to fix this would be to have rsyslogd inside the containers use TCP instead of UDP, but I don't know what the downsides would be here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202311 MSFT Triaged this issue has been triaged
Projects
None yet
4 participants