-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC2775: Lazy loading over federation #2775
Changes from 4 commits
e217542
192d508
8b6ff35
cf2ad9b
b640f45
21da07a
3b78a26
6e2d7c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,126 @@ | ||||||
# Lazy loading room membership over federation | ||||||
|
||||||
## Problem | ||||||
|
||||||
Joining remote rooms for the first time from your homeserver can be very slow. | ||||||
This is particularly painful for the first time user experience of a new | ||||||
homeserver owner. | ||||||
|
||||||
Causes include: | ||||||
* Room state can be big. For instance, a /send_join response for Matrix HQ is | ||||||
currently 24MB of JSON covering 28,188 events, and could easily take tens of | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there is some duplication we can get rid of. Some state events are also mentioned in the auth chain. Maybe this can be fixed by only sending it as one list of event jsons and one list of only the event ids that are in the state There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, this was mentioned already in #2775 (comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can also think about how we can improve the auth chain size. Member events don't have to mention the previous member event in most cases and can instead mention an old member event or none at all in public rooms |
||||||
seconds to calculate and send (especially on lower-end hardware). | ||||||
* All these events have to be verified by the receiving server. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Our testing shows the main problem is writing the events to the database. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to keep them in ram and persist them in the background while users can already use the room? Probably not because the server might crash... |
||||||
* Your server may have to fetch ths signing keys for all the servers who have | ||||||
sent state into the room. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This can be improved by including the public key in the event (instead of the server name?) |
||||||
|
||||||
This also impacts peeking over federation | ||||||
([MSC2444](https://github.com/matrix-org/matrix-doc/pull/2444)), which is even | ||||||
more undesirable, given users expect peeking to have a very snappy UX, letting them | ||||||
quickly check links to sample rooms etc. | ||||||
|
||||||
For instance Gitter shows a usable peeked page for a room with 20K | ||||||
members in under 2 seconds (https://gitter.im/webpack/webpack) including | ||||||
launching the whole webapp. Similarly Discord loads usable state for a server | ||||||
with 90K users like https://chat.vuejs.org in around 2s. | ||||||
|
||||||
## Proposal | ||||||
|
||||||
The vast majority of state events in Matrix today are `m.room.member` events. | ||||||
For instance, 99.4% (30661 out of 30856) of Matrix HQ's state is | ||||||
`m.room.member`s (see Stats section below). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be interesting to know how many of these are actually Certainly another optimisation would be to not bother telling homeservers about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it's approximately 2:1 join:leave for #matrix:matrix.org, but approximately even for #fdroid:f-droid.org. It'd be a trickier query for "not users that have left".
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. current figures for
|
||||||
|
||||||
Therefore, in the response to `/send_join` (or a MSC2444 `/peek`), we propose | ||||||
sending only the following `m.room.member` events (if the initiating server | ||||||
includes `lazy_load_members: true` in their JSON request body): | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the request body for a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also, needs an unstable prefix, I guess. Suggest |
||||||
|
||||||
* the "hero" room members which are needed for clients to display | ||||||
a summary of the room (based on the | ||||||
[requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148)) | ||||||
Comment on lines
+37
to
+39
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we really need this as well as a |
||||||
* any members which are in the auth chain for the state events in the response | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the auth chain ends up in a separate section, so I think this is a no-op. |
||||||
* any members for user_ids which are referred to by the content of state events | ||||||
in the response (e.g. `m.room.power_levels`) <-- TBD. These could be irrelevant, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does seem irrelevant, as the power levels are still enforced even for users that we don’t know about yet. Anything that’s important for auth will already be in the auth chain. |
||||||
plus we don't know where to look for user_ids in arbitrary state events. | ||||||
|
||||||
In addition, we extend the response to `/send_join` and `/peek` to include a | ||||||
`summary` block, matching that of the CS `/sync` API, giving the local server | ||||||
the necessary data to support MSC1227 CS API lazy loading. | ||||||
|
||||||
The joining server can then sync in the remaining membership events by calling | ||||||
`/state` as of the user's join event. To avoid retrieving duplicate data, we | ||||||
propose adding a parameter of `lazy_load_members_only: true` to the JSON | ||||||
request body which would then only return the missing `m.room.member` events. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This implies that the homeserver needs to track which membership events have been sent to which users, which feels like it might create a lot of additional complexity for homeserver implementors. It might just be better (certainly a lot simpler) to send the entire room state and deal with the duplicates. |
||||||
|
||||||
Clients which are not lazy loading members (by MSC1227) must block returning | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does it mean for a client to "block returning" an API? |
||||||
the CS API `/join` or `/peek` until this `/state` has completed and been | ||||||
processed. | ||||||
|
||||||
Clients which are lazy loading members however may return the initial `/join` | ||||||
or `/peek` before `/state` has completed. However, we need a way to tell | ||||||
clients once the server has finished synchronising its local state. For | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to spec how the client tells the server that it knows how to handle SS LL |
||||||
instance, clients must not let the user send E2EE messages until their server | ||||||
has acquired the full set of room members for the room, otherwise some of the | ||||||
users will not have the keys to decrypt the message. We do this by adding an | ||||||
`syncing: true` field to the room's `state` block in the `/sync` response. | ||||||
Once this field is missing or false, the client knows it is safe to call | ||||||
`/members` and get a full list of the room members in order to encrypt | ||||||
successfully. The field can also be used to advise the client to not | ||||||
prematurely call `/members` to show an incomplete membership list in its UI | ||||||
(but show a spinner or similar instead). | ||||||
|
||||||
While the joining server is busy syncing the remaining room members via | ||||||
`/state`, it will also need to sync new inbound events to the user (and old | ||||||
ones if the user calls `/messages`). If these events refer to members we're | ||||||
not yet aware of (e.g. they're sent by a user our server hasn't lazyloaded | ||||||
yet) we should separately retrieve their membership event so the server can | ||||||
include it in the `/sync` response to the client. To do this, we add fields | ||||||
to `/state` to let our server request a specific `type` and `state_key` from | ||||||
the target server. | ||||||
|
||||||
## Alternatives | ||||||
|
||||||
Rather than making this specific to membership events, we could lazy load all | ||||||
state by default. However, it's challenging to know which events the server | ||||||
ara4n marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
(and clients) need up front in order to correctly handle the room - plus this | ||||||
list may well change over time. For instance, do we need to know the | ||||||
`uk.half-shot.bridge` event in the Stats section up front? | ||||||
|
||||||
Rather than reactively pulling in missing membership events as needed while | ||||||
the room is syncing in the background, we could require the server we're | ||||||
joining via to proactively push us member events it knows we don't know about | ||||||
yet, and save a roundtrip. This feels more fiddly though; we can optimise this | ||||||
edge case if it's actually needed. | ||||||
|
||||||
## Security considerations | ||||||
|
||||||
We currently trust the server we join via to provide us with accurate room state. | ||||||
This proposal doesn't make this any better or worse. | ||||||
Comment on lines
+152
to
+153
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. per the above, I think it does, since we now trust lots of servers to give us accurate room state - at least while we're lazy-loading the state. |
||||||
|
||||||
## Related | ||||||
|
||||||
MSC1228 (and future variants) also will help speed up joining rooms | ||||||
significantly, as you no longer have to query for server keys given the room | ||||||
ID becomes a server's public key. | ||||||
|
||||||
## Stats | ||||||
|
||||||
``` | ||||||
matrix=> select type, count(*) from matrix.state_events where room_id='!OGEhHVWSdvArJzumhm:matrix.org' group by type order by count(*) desc; | ||||||
type | count | ||||||
---------------------------+------- | ||||||
m.room.member | 30661 | ||||||
m.room.aliases | 141 | ||||||
m.room.server_acl | 23 | ||||||
m.room.join_rules | 9 | ||||||
m.room.guest_access | 6 | ||||||
m.room.power_levels | 5 | ||||||
m.room.history_visibility | 3 | ||||||
m.room.name | 1 | ||||||
m.room.related_groups | 1 | ||||||
m.room.avatar | 1 | ||||||
m.room.topic | 1 | ||||||
m.room.create | 1 | ||||||
uk.half-shot.bridge | 1 | ||||||
m.room.canonical_alias | 1 | ||||||
m.room.bot.options | 1 | ||||||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now 115M and 144K events, for the record.