Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2775: Lazy loading over federation #2775

Closed
wants to merge 8 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 39 additions & 10 deletions proposals/2775-lazy-loading-over-federation.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,11 @@ includes `lazy_load_members: true` in their JSON request body):
a summary of the room (based on the
[requirements of the CS API](https://github.com/matrix-org/matrix-doc/blob/1c7a6a9c7fa2b47877ce8790ea5e5c588df5fa90/api/client-server/sync.yaml#L148))
* any members which are in the auth chain for the state events in the response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the auth chain ends up in a separate section, so I think this is a no-op.

* any members which are power events (aka control events): bans & kicks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need kicks here?

* one joined member per server (if we want to be able to send messages while
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this will create a potential race at the protocol level that may be exploitable by a bad actor in the room.

For example, in the situation that you join a room with two or more other users that are resident on the same server remote.com. You learn about user A but not about B, C or D.

User A detects your join and then leaves the room immediately before you are able to retrieve the rest of the room state, therefore you think you are the only occupant of the room. As you don’t know about any users from remote.com anymore, you no longer know if that server is still resident in the room and therefore you don’t know if you can ask it for room state.

The impact of this is lessened if you can include more than one membership from a given homeserver—even knowing about two or three users reduces the chance of this ever being an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for context, this will be 2462 membership events for Matrix HQ (of 54194 total state) at present.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just ask the server we join via to send us a list of the servers in the room? There doesn't seem to be any need to have actual membership events for them.

the room state is synchronising, otherwise we won't know where to send them
to)
* any membership events with membership `invite` (to mitigate risk of double invites)
* any members for user_ids which are referred to by the content of state events
in the response (e.g. `m.room.power_levels`) <-- TBD. These could be irrelevant,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem irrelevant, as the power levels are still enforced even for users that we don’t know about yet. Anything that’s important for auth will already be in the auth chain.

plus we don't know where to look for user_ids in arbitrary state events.
Expand All @@ -51,22 +56,28 @@ The joining server can then sync in the remaining membership events by calling
propose adding a parameter of `lazy_load_members_only: true` to the JSON
request body which would then only return the missing `m.room.member` events.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that the homeserver needs to track which membership events have been sent to which users, which feels like it might create a lot of additional complexity for homeserver implementors. It might just be better (certainly a lot simpler) to send the entire room state and deal with the duplicates.


The remote server may decide not to honour lazy_loading if a room is too small
(thus saving the additional roundtrip of calling `/state`), so the response to
`/send_join` or `/peek` must include a `lazy_load_members: true` field if the
state is partial and members need to be subsequently loaded by `/state`.

Clients which are not lazy loading members (by MSC1227) must block returning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean for a client to "block returning" an API?

the CS API `/join` or `/peek` until this `/state` has completed and been
processed.

Clients which are lazy loading members however may return the initial `/join`
or `/peek` before `/state` has completed. However, we need a way to tell
clients once the server has finished synchronising its local state. For
instance, clients must not let the user send E2EE messages until their server
has acquired the full set of room members for the room, otherwise some of the
users will not have the keys to decrypt the message. We do this by adding an
`syncing: true` field to the room's `state` block in the `/sync` response.
Once this field is missing or false, the client knows it is safe to call
`/members` and get a full list of the room members in order to encrypt
successfully. The field can also be used to advise the client to not
prematurely call `/members` to show an incomplete membership list in its UI
(but show a spinner or similar instead).
clients once the server has finished synchronising its local state. We do this
by adding an `syncing: true` field to the room's `state` block in the `/sync`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly do clients do with this field? I thought that clients which do lazy-loading syncs were obliged to expect partial state blocks anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do expect partial state blocks, but they currently don't know that they're partial - and so then go and hit /members anyway to fill in the missing members. so this i think is fixing that thinko by giving the client a clear way to know that state is partial and they need to fill it in.

response. Once this field is missing or false, the client knows that the joining
server has fully synchronised the state for this room. Operations which are
blocked on state being fully synchronised are:

* Sending E2EE messages, otherwise some of the users will not have the keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider here is that we can’t even start sending device list updates for users until we learn about those users, let alone exchanging keys, so this might create another protocol-level race when joining E2E rooms if you start sending messages into the room before you know about all the devices in the room (resulting in UTDs).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neilalexander I'm not really following you here. You seem to be saying the same thing as the MSC (if a user sends E2E messages before they have the user list, their client will not know who to encrypt for).

to decrypt the message.
* Calling /members to get an accurate detailed list of the users in the room.
Instead clients showing a membership list should calculate it from the
members they do have, and the room summary (e.g. "these 5 heroes + 124 others")

While the joining server is busy syncing the remaining room members via
`/state`, it will also need to sync new inbound events to the user (and old
Expand All @@ -77,6 +88,24 @@ include it in the `/sync` response to the client. To do this, we add fields
to `/state` to let our server request a specific `type` and `state_key` from
the target server.

Matrix requires each server to track the full state rather than a partial
state in its DB for every event persisted in the DAG, in order to correctly
calculate resolved state as of that event for authorising events and servicing
/state queries etc. Loading the power events up front lets us authorise new
events (backfilled & new traffic) using partial state. However, once our
ara4n marked this conversation as resolved.
Show resolved Hide resolved
server has fully synced the state of the room at the point of the join event,
we must rollback the DAG and replay all the events we've accepted into the
room DAG in order to correctly capture the full state in the room as of that
event. This could theoretically result in some events now being
rejected/soft-failed, so it's important that "uncommitted" events in the DAG
(i.e. those which arrived since the join, but before state was fully synced)
do not have side-effects on the rest of the server (e.g. generate push) until
the room is fully synced.

XXX: what's an example of an event being failed/rejected during replay which
was previously accepted? If we could auth it correctly before, shouldn't it
still auth correctly afterwards?
ara4n marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives

Rather than making this specific to membership events, we could lazy load all
Expand Down