Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Further reduce INFO logging #8021

Open
erikjohnston opened this issue Aug 3, 2020 · 8 comments
Open

Further reduce INFO logging #8021

erikjohnston opened this issue Aug 3, 2020 · 8 comments
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@erikjohnston
Copy link
Member

Some candidates that could be removed from default logging:

  1. OPTIONS request processing (~7% of log lines on jki.re)
  2. /sync request processing (~7% of log lines on jki.re, ignoring INFO)
  3. Sending request (~12% of log lines on jki.re)
  4. Received txn from (~11% of log lines on jki.re), though we probably(?) want to somehow log how many PDUs the transaction contained in the processing request log line (or maybe have a generic "how many new events were persitsed" like we do for dbevts)
@erikjohnston erikjohnston added the A-Logging Synapse's logs (structured or otherwise). Not metrics. label Aug 3, 2020
@erikjohnston
Copy link
Member Author

plot

Is roughly what is happening over a day for jki.re

erikjohnston added a commit that referenced this issue Aug 11, 2020
c.f. #8021 

A lot of the code here is to change the `Completed 200 OK` logging to include the request URI so that we can drop the `Sending request...` log line.

Some notes:

1. We won't log retries, which may be confusing considering the time taken log line includes retries and sleeps.
2. The `_send_request_with_optional_trailing_slash` will always be logged *without* the forward slash, even if it succeeded only with the forward slash.
@erikjohnston
Copy link
Member Author

Note that debian have changed the default log level for some particular handlers: https://salsa.debian.org/matrix-team/matrix-synapse/-/blob/debian/unstable/debian/log.yaml

@clokep
Copy link
Member

clokep commented Aug 18, 2020

Sending device list update notif for ... tends to end up with very long log lines (at least on matrix.org).

@erikjohnston erikjohnston removed their assignment Mar 25, 2022
@anoadragon453 anoadragon453 added T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. X-Needs-Discussion labels Mar 25, 2022
@callahad callahad added P3 (OBSOLETE: use S- labels.) Approved backlog: not yet scheduled, will accept patches and removed X-Needs-Discussion labels Mar 31, 2022
@erikjohnston
Copy link
Member Author

We now do a lot of log lines around:

2022-04-27 09:34:08,508 - synapse.replication.tcp.handler - 593 - INFO - process-replication-data-23576884 - Caught up with stream 'events' to 2893728403
2022-04-27 09:34:08,509 - synapse.replication.tcp.handler - 540 - INFO - replication_command_handler-140374159398168 - Handling 'POSITION events event_persister-2 2893728399 2893728400'

which are probably not needed at INFO

@erikjohnston
Copy link
Member Author

We should take another crack at this, otherwise we may need to reduce log retention to uncomfortable levels on matrix.org.

@H-Shay
Copy link
Contributor

H-Shay commented May 2, 2022

Totally dumb question but for my own edification:
I've noticed that whenever anything goes wrong/behaves mysteriously in Synapse usually the first order of business is to add log lines. I am wondering then why we are removing them here (presumably whoever added them to begin with thought they might be of some use) rather than doing something like only storing a minimal number of days of logs (like 5?) on the servers they were generated on and then moving the older logs to some sort of backup storage where they can be maintained for 14-21 days or whatever people feel comfortable with. I am imaging this is something that could be automated-just wondering why we wouldn't do this.

@erikjohnston
Copy link
Member Author

Good question Shay! Some notes:

  • Yes we do want to have an off-box log archive, but that is something that is non-trivial and is on the Ops backlog
  • We automatically delete old logs, but without a log archive we don't really want to go too low.
  • There are other deployments out there, which would also benefit from this work.

I've noticed that whenever anything goes wrong/behaves mysteriously in Synapse usually the first order of business is to add log lines.

This is true, but the flip side is that we often fail to remove the logging when we've tracked down the issues. It's quite easy to slowly accrue extra log lines over time, so it is worth going through and periodically seeing if there is anything that can be pruned.

@erikjohnston
Copy link
Member Author

#12610 seems to have helped reduce our log sizes by 25-50%

@H-Shay H-Shay added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Tolerable Minor significance, cosmetic issues, low or no impact to users. and removed P3 (OBSOLETE: use S- labels.) Approved backlog: not yet scheduled, will accept patches labels Mar 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Logging Synapse's logs (structured or otherwise). Not metrics. O-Frequent Affects or can be seen by most users regularly or impacts most users' first experience S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

5 participants