Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rsyslog]: Use RELP instead of UDP for forwarding from container to host #18113

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

saiarcot895
Copy link
Contributor

@saiarcot895 saiarcot895 commented Feb 16, 2024

Why I did it

When the host's rsyslog is restarted (for example, to regenerate the config after some changes, or as part of some automated script), there is a chance that some syslog messages from the containers are lost. Most of the time, this isn't an issue. However, if there are test cases that expect all syslogs to be present (such as the advanced-reboot test case), then this can cause a problem. Additionally, this could affect debuggability of issues where a rsyslog restart happens in the middle.

There are two options for reliable message transport in rsyslog: TCP and RELP. With TCP, while the protocol knows whether a syslog message has been delivered or not, the application doesn't know, because there is no feedback from the remote side saying the message was received. This means that there is still a chance that messages could be lost when the connection is broken (if, for example, the host rsyslog gets restarted), because after the connection is established, the sender rsyslog (in the
container) doesn't know if the message has been received or not.

RELP builds on top of TCP, and adds a feedback mechanism where the remote side notifies the sender whether the message has actually been received or not. This makes it much less likely to lose a message. There is one known possible case where a message (or messages) could be lost: the network is down, and rsyslog gets restarted. This at least requires both the network and rsyslog to have an issue, rather than just one. There is also a slim possibility where a message could get duplicated; this should be mostly fine (hopefully).

RELP does require that both sides are using a recent version of rsyslogd (at least 7.3.16, which looks like it was released more than 10 years ago), but since we use Debian on both the container and the host, it should be fine.

Therefore, switch to using RELP when sending syslog messages from the container to the host.

Fixes #17792.

Work item tracking
  • Microsoft ADO (number only): 28314311

How I did it

Modify the rsyslog.conf file on the host and the container to use RELP instead of UDP.

In addition, update the syntax used for the config files to the (newer) RainierScript format, which, among other things, makes it easier to set settings for specific outputs.

How to verify it

Stop rsyslogd on the host, make sure that the containers generate some syslogs, restart rsyslogd on the host, and verify no logs were lost.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

When the host's rsyslog is restarted (for example, to regenerate the
config after some changes, or as part of some automated script), there
is a chance that some syslog messages from the containers are lost. Most
of the time, this isn't an issue. However, if there are test cases that
expect all syslogs to be present (such as the advanced-reboot test
case), then this can cause a problem. Additionally, this could affect
debuggability of issues where a rsyslog restart happens in the middle.

There are two options for reliable message transport in rsyslog: TCP and
RELP. With TCP, while the protocol knows whether a syslog message has
been delivered or not, the application doesn't know, because there is no
feedback from the remote side saying the message was received. This
means that there is still a chance that messages could be lost when the
connection is broken (if, for example, the host rsyslog gets restarted),
because after the connection is established, the sender rsyslog (in the
container) doesn't know if the message has been received or not.

RELP instead adds a feedback mechanism where the remote side notifies
the sender whether the message has actually been received or not. This
makes it much less likely to lose a message. There is one known possible
case where a message (or messages) could be lost: the network is down,
and rsyslog gets restarted. This at least requires both the network and
rsyslog to have an issue, rather than just one. There is also a slim
possibility where a message could get duplicated; this should be mostly
fine (hopefully).

RELP does require that both sides are using a recent version of rsyslogd
(at least 7.3.16, which looks like it was released more than 10 years
ago), but since we use Debian on both the container and the host, it
should be fine.

Therefore, switch to using RELP when sending syslog messages from the
container to the host.

Signed-off-by: Saikrishna Arcot <[email protected]>
…urst not being defined

$SystemLogRateLimitInterval and $SystemLogRateLimitBurst both come from
the imuxsock module. Specify them as module parameters (and also remove
the legacy syntax).

Signed-off-by: Saikrishna Arcot <[email protected]>
By default, just using omrelp doesn't hold log messages if the server
happens to be unavailable. This needs to be configured manually.

Configure an in-memory storage (of a linked list) that by default will
store up to 1000 messages (this appears to be a default value that can
be bumped up) if the server is unavailable. I'm assuming this will be
sufficient for most cases.

Assuming each message is 512 bytes (many of our messages will be smaller
than this), this will take up an additional 512kB of memory if 1000
messages are queues. If there are no messages queued, then no additional
space is taken up.

Signed-off-by: Saikrishna Arcot <[email protected]>
saiarcot895 added a commit to saiarcot895/sonic-mgmt that referenced this pull request Feb 26, 2024
If rsyslogd on the host goes down, and rsyslogd on the containers is
configured to use librelp to forward messages to the host rsyslogd
(instead of UDP), then there will be error messages from the container
rsyslogd about not being able to forward messages.

Ignore these error messages as they are expected when running tests
which may restart rsyslogd.

This is in preparation for sonic-net/sonic-buildimage#18113

Signed-off-by: Saikrishna Arcot <[email protected]>
@saiarcot895 saiarcot895 marked this pull request as draft June 7, 2024 18:29
@saiarcot895
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

yxieca pushed a commit to sonic-net/sonic-mgmt that referenced this pull request Aug 1, 2024
* Ignore errors about rsyslogd w/ librelp not being able to send syslogs

If rsyslogd on the host goes down, and rsyslogd on the containers is
configured to use librelp to forward messages to the host rsyslogd
(instead of UDP), then there will be error messages from the container
rsyslogd about not being able to forward messages.

Ignore these error messages as they are expected when running tests
which may restart rsyslogd.

This is in preparation for sonic-net/sonic-buildimage#18113

Signed-off-by: Saikrishna Arcot <[email protected]>
@saiarcot895
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895 saiarcot895 marked this pull request as ready for review August 7, 2024 16:42
@saiarcot895 saiarcot895 requested a review from prgeor August 7, 2024 16:42
@saiarcot895
Copy link
Contributor Author

/azpw ms_checker

@saiarcot895
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +15 to +17
module(load="imuxsock" SysSock.RateLimit.Interval="300" SysSock.RateLimit.Burst="20000") # provides support for local system logging
#module(load="imklog") # provides kernel logging support
#module(load="immark") # provides --MARK-- message capability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 can you mention this syntax change in PR description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -37,7 +33,8 @@ set $.CONTAINER_NAME=getenv("CONTAINER_NAME");

# Set remote syslog server
template (name="ForwardFormatInContainer" type="string" string="<%PRI%>%TIMESTAMP:::date-rfc3339% %HOSTNAME% %$.CONTAINER_NAME%#%syslogtag%%msg:::sp-if-no-1st-sp%%msg%")
*.* action(type="omfwd" target=`echo $SYSLOG_TARGET_IP` port="514" protocol="udp" Template="ForwardFormatInContainer")
module(load="omrelp")
*.* action(type="omrelp" target=`echo $SYSLOG_TARGET_IP` port="2514" action.resumeRetryCount="-1" queue.type="LinkedList" Template="ForwardFormatInContainer")
Copy link
Contributor

@prgeor prgeor Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 2514 is the port used by relp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no standard port used by RELP. The default port that rsyslog uses is 514, but that can conflict with regular syslog forwarding over TCP. A couple of the examples in the documentation for this feature uses 2514.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 are we still using bullseye?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, bullseye is still being used for a couple containers.

Copy link
Contributor

@prgeor prgeor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about platform/vs/docker-sonic-vs/etc/rsyslog.conf don't need this change?

#$ModLoad immark # provides --MARK-- message capability
module(load="imuxsock" {% if rate_limit_interval is not none %}SysSock.RateLimit.Interval="{{ rate_limit_interval }}"{% endif %} {% if rate_limit_burst is not none %}SysSock.RateLimit.Burst="{{ rate_limit_burst }}"{% endif %}) # provides support for local system logging
module(load="imklog") # provides kernel logging support
#module(load="immark") # provides --MARK-- message capability

# provides UDP syslog reception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895 This UDP syslog is for remote server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the case of a remote syslog server sending over UDP.

@saiarcot895
Copy link
Contributor Author

what about platform/vs/docker-sonic-vs/etc/rsyslog.conf don't need this change?

Strictly speaking, it doesn't need this change, because the logs aren't actually being forwarded anywhere. It'll forward it to localhost port 514, but there likely won't be anything listening on this port. That container doesn't end up on the device.

It would be nice to update the syntax there to have it use the new syntax, but I'll keep that separate for now.

arista-hpandya pushed a commit to arista-hpandya/sonic-mgmt that referenced this pull request Oct 2, 2024
* Ignore errors about rsyslogd w/ librelp not being able to send syslogs

If rsyslogd on the host goes down, and rsyslogd on the containers is
configured to use librelp to forward messages to the host rsyslogd
(instead of UDP), then there will be error messages from the container
rsyslogd about not being able to forward messages.

Ignore these error messages as they are expected when running tests
which may restart rsyslogd.

This is in preparation for sonic-net/sonic-buildimage#18113

Signed-off-by: Saikrishna Arcot <[email protected]>
@saiarcot895
Copy link
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-buildimage

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895 saiarcot895 requested a review from prgeor October 14, 2024 21:19
prgeor
prgeor previously approved these changes Oct 23, 2024
vikshaw-Nokia pushed a commit to vikshaw-Nokia/sonic-mgmt that referenced this pull request Oct 23, 2024
* Ignore errors about rsyslogd w/ librelp not being able to send syslogs

If rsyslogd on the host goes down, and rsyslogd on the containers is
configured to use librelp to forward messages to the host rsyslogd
(instead of UDP), then there will be error messages from the container
rsyslogd about not being able to forward messages.

Ignore these error messages as they are expected when running tests
which may restart rsyslogd.

This is in preparation for sonic-net/sonic-buildimage#18113

Signed-off-by: Saikrishna Arcot <[email protected]>
In case rsyslog can't forward messages to the host's rsyslog server,
messages will be queued so that they can be sent out later. For this
queue, set a limit of 20000 messages so that rsyslog doesn't take too
much memory. Assuming each message is 512 bytes, the approximate maximum
additional memory usage is 10MB.

Signed-off-by: Saikrishna Arcot <[email protected]>
@qiluo-msft
Copy link
Collaborator

This is kind of new feature. Is it possible to config for new behavior or old behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[systemd][teamsyncd] missing logs during restarting system logging service
4 participants