-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After updating the Synapse server to version 1.109.0~rc1, new messages do not leave / arrive. #17274
Comments
Hi @ELForcer, thanks for your report. Do you have any workers configured? If you don't know, it's likely you don't. For developers, here's the relevant bit of the code: synapse/synapse/handlers/device.py Lines 863 to 874 in b71d277
#17211 may be a possible contender for the regression, as it's only been introduced in v1.109.0rc1 as touches nearby code (outbound device pokes). |
@wrjlewis mentioned that a possible workaround is to run: SELECT setval('device_lists_sequence', (
SELECT GREATEST(MAX(stream_id), 0) FROM device_lists_stream
)); And then start the server on v1.109.0rc1. However the code should still be fixed. Edit: See the below comment for warnings on running this query. |
@anoadragon453 That SQL should only be run if Synapse prompts you to do so, and while Synapse is offline. Moving streams can be quite dangerous if you get it wrong. I think this an edge case with #17229, where we didn't include |
Actually, I've changed tack, but PR up at #17292 |
@ELForcer This should have been fixed in v1.109.0-rc2. Could you try upgrading and report back? |
Hello. It's gotten worse at the moment. The matrix-synapse service no longer starts.
Log:
My understanding is that it cannot automatically perform the operation:
and this SQL query needs to be executed independently? |
At the moment I have rolled back to package version 1.108.0. |
+1, I am also hitting this |
I was able to reproduce locally. Steps to reproduce:
Database looks like this:
Even simpler to trigger:
|
I believe we fixed this, debs for v1.109.0rc3 with the fix should be available in a few hours. If it indeed fixes your issue, we should be able to release v1.109.0 tomorrow |
rc3 works fine for me. Sync seems to work fine, too. Not sure what the weirdness I saw with rc1/rc2 was about. |
I confirm. Upgrading to 1.109.0rc3 no longer crashes the homeserver. Messages reach the recipient. There are no more errors in the Log.
|
The task can be considered closed. |
Just upgraded from 104.0 to 109.0 via the docker image (or rather a derived image with a simple change which installs the
It seems there are no other related errors in the logs. Afterwards, Element-X (on Android) does not sync any changes anymore, same with the nheko desktop client. Element web does seem to work correctly though. For now, I downgraded to 108.0, which seems to fix all these problems, and synapse does not report the error anymore. |
Ran into a very similar issue on updating to 109.0 via docker image as well, with messages delivery freezing entirely. This with only a federation-sender worker and one worker that handles the summary MSC endpoint. Log contained;
With the "Waiting for current token to reach StreamToken(...)" message being spammed rapidly. In digging through it and the database, I found that the E.g. SELECT setval('un_partial_stated_room_stream_sequence', 620); |
I have the same issue. I only upgrade between full releases (no rc) and have been doing so for a while.
The server is not federated and |
@baconsalad Your log seems to list two keys out of phase; |
I might have a reason & solution for my problem. After a recent reconfiguration, synapse was unable to reach my unified push server, leaving messages like this in the log:
With versions < 1.109.0, this did not block messages from being forwarded to the client (apart from the push not working reliably). With 1.109.0, this probably does not work anymore, maybe because of #17240? (Update: No, that's probably not the right PR.) Just a theory for now. I fixed my configuration and will try the update again soon. |
I think #17386 should fix this 🤞 |
Fixes #17274, hopefully. Basically, old versions of Synapse could advance streams without persisting anything in the DB (fixed in #17229). On restart those updates would get lost, and so the position of the stream would revert to an older position. If this happened across an upgrade to a later Synapse version which included #17215, then sync could get blocked indefinitely (until the stream advanced to the position in the token). We fix this by bounding the stream positions we'll wait for to the maximum position of the underlying stream ID generator.
Fixes #17274, hopefully. Basically, old versions of Synapse could advance streams without persisting anything in the DB (fixed in #17229). On restart those updates would get lost, and so the position of the stream would revert to an older position. If this happened across an upgrade to a later Synapse version which included #17215, then sync could get blocked indefinitely (until the stream advanced to the position in the token). We fix this by bounding the stream positions we'll wait for to the maximum position of the underlying stream ID generator.
I originally encountered this when upgrading from Details
|
Description
I updated the package using apt, after which after a while I noticed high CPU consumption for Synapse processes, and then that messages were not actually sent to the recipients. Restarting the service didn't help.
At the moment, rolling back to package version 1.108.0 helped.
Steps to reproduce
sudo apt update
sudo apt upgrade
Homeserver
homeserver
Synapse Version
1.109.0rc1
Installation Method
Debian packages from packages.matrix.org
Database
PostgreSQL 16
Workers
I don't know
Platform
Configuration
No response
Relevant log output
Anything else that would be useful to know?
No response
The text was updated successfully, but these errors were encountered: