-
Notifications
You must be signed in to change notification settings - Fork 392
Fix case where get_partial_current_state_deltas
could return >100 rows
#18960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Fix case where get_partial_current_state_deltas
could return >100 rows
#18960
Conversation
This fixes the case where every state delta in `current_state_delta_stream` has a count of 1, meaning `total` will be 100. Before this change, that would result in `clipped_stream_id = max_stream_id`, meaning we'd potentially pull out millions of rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing materially wrong with this, just some thoughts, what do you think?
|
||
total = 0 | ||
|
||
for stream_id, count in txn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wondering if instead of doing this with a for loop, we should just ask the DB for the result we want directly?
I think this would do it, as long as you still handle the case where there are no result rows (less than 100 left, so no need to clamp)
SELECT stream_id
FROM current_state_delta_stream
WHERE stream_id > ? AND stream_id <= ?
ORDER BY stream_id ASC
LIMIT 1 OFFSET 99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually just ended up using LIMIT
in the second query in 0e40bf4.
Am I missing something - is it really this simple? All the unit tests appear to pass 🤷
Credit to @reivilibre for the idea.
Simplify the queries to just a single one.
Complement CI is failing due to a flaky test. Edit: And sytest appears to just be stalling... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean.... it's weird if the tests don't catch this, probably a smell that we're missing a test
# don't select toooo many. | ||
sql = """ | ||
SELECT stream_id, count(*) | ||
SELECT stream_id, room_id, type, state_key, event_id, prev_event_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afraid this doesn't work AFAICT, because stream_id
is not unique, but we always want to process the entire state with the same stream_id
(never process some of the rows but not others)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So would this mean that if a group of state deltas with the same stream_id
that was >100 items would cause this function to deadlock?
I suppose we should just return the whole group, even if it's larger than limit
, in that case.
Edit: especially since the docstring states that if the list of returned state deltas is empty, then we are up to date. Potentially skipping over a lot of state deltas!
We now: 1. Group state deltas by `stream_id` and get their count 2. Count (in the DB) until we go over our limit. We then get the `stream_id` we could naively go up to, as well as the clamped `stream_id` which would keep us under our limit. The second query fetches rows up to and including the clamped `stream_id`. We also add a unit test that injects multiple state deltas with the same `stream_id`, which correctly failed when tested against the previous implementation in this PR.
get_partial_current_state_deltas
is supposed to return max 100 rows. We relied on this assumption in #18926, forgoing batching of state deltas. It turns out thatget_partial_current_state_deltas
can potentially return millions of rows, and we saw this in production on element.io.This PR fixes the case where every state delta in
current_state_delta_stream
has a count of 1, meaningtotal
will be 100. Before this change, that would result inclipped_stream_id = max_stream_id
, meaning we'd potentially pull out millions of rows.I haven't written a regression test for this and I'm not sure how to easily seedI was able to write a unit test without writing SQL directly.current_state_delta_stream
with the correct data. I suppose we could create a room and change the topic >100 times? Alternatively we could write directly to the table, but such a test would need to be updated if the schema changes.Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.