-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch db updates in Accumulator.Accumulate #196
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQL isn't the worst I've seen, but it is using a syntax I'm unfamiliar with. Please link me to some docs explaining what this is doing on the postgres side. Overall, I'm not opposed to something like this, as it basically allows us to batch update in a single query which is nice. That being said, we don't seem to call UpdateBeforeSnapshotIDs
yet, so I'm curious where you think this is going to fit in.
} | ||
for j, ev := range chunks[i] { | ||
if ev.NID != eventNID { | ||
t.Errorf("chunk %d got wrong event in position %d: got NID %d want NID %d", i, j, ev.NID, eventNID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to reuse the code in TestChunkify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the medium term I'd probably move everything over to the generic version so that there's only one true Chunkify.
CAST(:event_nid AS bigint) | ||
) | ||
) AS u(before_state_snapshot_id, event_replaces_nid, event_nid) | ||
WHERE e.event_nid = u.event_nid`, chunk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am failing to see what this SQL is doing.
We have a chunk of beforeSnapshotUpdate
structs. We are somehow making that addressable as u
by passing in chunk
, but I'm failing to see any $1
.
Once we have this addressable as u
, we just do a basic update which I do understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop the CAST
s for a moment, so that the query is
UPDATE syncv3_events AS e
SET before_state_snapshot_id = u.before_state_snapshot_id,
event_replaces_nid = u.event_replaces_nid
FROM (
VALUES (:before_state_snapshot_id, :event_replaces_nid, :event_nid)
) AS u(before_state_snapshot_id, event_replaces_nid, event_nid)
WHERE e.event_nid = u.event_nid
Suppose we have two updates that we want to apply, corresponding to two VALUES expressions (bss1, replace1, nid1)
and (bss2, replace2, nid2)
. My understanding is that sqlx will transform this query into
UPDATE syncv3_events AS e
SET before_state_snapshot_id = u.before_state_snapshot_id,
event_replaces_nid = u.event_replaces_nid
FROM (
VALUES ($1, $2, $3), ($4, $5, $6)
) AS u(before_state_snapshot_id, event_replaces_nid, event_nid)
WHERE e.event_nid = u.event_nid
and pass the query 6 parameters bss1, replace1, nid1, bss2, replace2, nid2
in that order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.psycopg.org/docs/extras.html#psycopg2.extras.execute_values seems to suggest that it does a similar thing (generate an appropriately-sized VALUES expression). I think psycopg2 does a better job of it, too.
@@ -946,3 +979,110 @@ func TestEventTableSelectUnknownEventIDs(t *testing.T) { | |||
} | |||
} | |||
} | |||
|
|||
func TestEventTableUpdateBeforeSnapshotIDs(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're going to need a lot more tests to convince me that the SQL is fine. Specifically:
UpdateBeforeSnapshotIDs
with nobeforeSnapshotUpdate
UpdateBeforeSnapshotIDs
with exactly 1beforeSnapshotUpdate
UpdateBeforeSnapshotIDs
withlen(beforeSnapshotUpdate)
> chunk size.beforeSnapshotUpdate
has some known and some unknown event NIDs.beforeSnapshotUpdate
has some duplicate event NIDs and some not duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very fair. I am also suspicious of this arcane incantation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beforeSnapshotUpdate has some duplicate event NIDs and some not duplicated.
I think this is essentially undefined behaviour. Quoting postgres' docs:
When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_item list, and each output row of the join represents an update operation for the target table. When using FROM you should ensure that the join produces at most one output row for each row to be modified. In other words, a target row shouldn't join to more than one row from the other table(s). If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable.
00c97b8 roughs out the logic I have in mind. Though it looks like plenty of tests are failing, so that'll need investigation. |
297dacf
to
5eee7a7
Compare
Rebased atop |
9de9301
to
1361964
Compare
88826c6
to
93779be
Compare
Additional debugging shows that the bulk of time is now spent in v2Pub.Notify: and I'm guessing that we presumably time out after 5s? Lines 83 to 85 in 2139eda
Or maybe it simply takes 5s to process 5k events in sliding-sync/sync3/handler/handler.go Lines 634 to 636 in b72ad3b
sliding-sync/sync3/dispatcher.go Lines 151 to 188 in e861ccb
I wonder if it might make sense to have a bulk |
Otherwise we have to get the storage functions to convert this to a nonnil slice before we insert in the DB.
func countStateEvents(events []Event) (count int64) { | ||
for _, event := range events { | ||
if event.IsState { | ||
count++ | ||
} | ||
} | ||
return count | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be inlined.
membershipNIDs = make([]int64, 0, len(se)) | ||
otherNIDs = make([]int64, 0, len(se)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to make this return nonnil slices if len(se) == 0
so that I didn't have to do this at the storage layer.
93779be
to
e5ee4ef
Compare
@kegsay did we want to land this or not? I can't remember the outcome of our discussion. |
The outcome of our discussion was to close this PR on the basis that whilst yes it improves things by ~25%, it's still doing more work than we actually want. Ideally, we would instead make the events in the
This implies a third function in the accumulator between |
Suppose a timeline has
M
new message events andS
new state events.The logic in
sliding-sync/state/accumulator.go
Lines 368 to 402 in c2f4b53
S
selects on 378S
inserts on 394M+S
updates on 399.It would be good to batch these up, because we occasionally see large gappy state timelines (
S
big) and we want to minimise the cost of handling these.Before this change, the new TestLargeGappyStateBlock takes 9.11s; afterwards it takes 6.95s. So a ~25% improvement on my machine. Not as much as I was hoping for, but the effect might be larger in a deployment where the database communication RTT is larger. The bulk of the remaining time is spent in v2Pub.Notify... though I don't understand why???
I had two doubts on this.
Is it possible to even do a batched update?
The answer seems to be "yes". Postgres extends the SQL spec to allow an
UPDATE ... FROM...
syntax, see https://www.postgresql.org/docs/14/sql-update.html. I cribbed this from jmoiron/sqlx#796 (comment).Is it possible to pre-advance the snapshot ID sequence so that the application can insert snapshots with those IDs in one update?
Answer: yes. I used the same technque Synapse does, which is essentially to repeatedly call
nexval
.