Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perf when streams don't change often #17767

Merged
merged 3 commits into from
Sep 30, 2024

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Sep 27, 2024

There is a bug with the StreamChangeCache where it would incorrectly return that all entities had changed if asked for entities changed since the earliest stream position.

Note that for streams we use the inequalities: $min_stream_id < stream_id <= $max_stream_id, i.e. when we ask the stream change cache for all things that have changed since $stream_id we don't care for events that happened at $stream_id.

Specifically: _earliest_known_stream_pos is the position at which we know that we'll have entries for all changes since that point, we can use the cache for any stream IDs that equal _earliest_known_stream_pos.

_earliest_known_stream_pos is set in three places:

  • On startup we set it either to:
    • the current maximum stream ID, with not prefilled values; or
    • the minimum of the latest N values we pulled from the DB
  • When we evict items from the bottom, we set it to the stream ID of the evicted items.

This was changed in matrix-org/synapse#14435, but I think we were overly conservative there.

There is a bug with the `StreamChangeCache` where it would incorrectly
return that all entities had changed if asked for entities changed
*since* the earliest stream position.
Copy link
Contributor

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this looks correct, I think it is missing 1 tweak to the tests though.

Comment on lines +145 to +147
# _cache is not valid before the earliest known stream position, so
# return that the entity has changed.
if stream_pos <= self._earliest_known_stream_pos:
if stream_pos < self._earliest_known_stream_pos:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long conversation about this @ matrix-org/synapse#14435 (comment), during which I said:

I'm a bit skeptical of the resulting consistency between strict and non-strict comparisons, but would like a double checking.

Your response was:

as if we call entity_has_changed for a stream pos then has_entity_changed must return True for that stream pos. Either because we remember the entity has changed or because the stream pos is before the minimum threshold

It looks like I interpreted "before" incorrectly in that case, since at the stream position we do have the "proper" information needed to calculate whether there were changes or not.

The comment for _earliest_known_stream_pos even makes this clear:

# the earliest stream_pos for which we can reliably answer
# get_all_entities_changed. In other words, one less than the earliest
# stream_pos for which we know _cache is valid.

tl;dr Before #14435 we were inconsistently checking if it was strictly before or before-or-equal, we made them consistent in that PR, but to the wrong one.

Comment on lines +56 to +57
self.assertTrue(cache.has_entity_changed("[email protected]", 2))
self.assertTrue(cache.has_entity_changed("[email protected]", 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add checks for 3 (near line 35)?

@erikjohnston erikjohnston marked this pull request as ready for review September 30, 2024 10:12
@erikjohnston erikjohnston requested a review from a team as a code owner September 30, 2024 10:12
changelog.d/17767.misc Outdated Show resolved Hide resolved
Co-authored-by: Andrew Morgan <[email protected]>
@erikjohnston erikjohnston merged commit 81e0f57 into develop Sep 30, 2024
37 of 39 checks passed
@erikjohnston erikjohnston deleted the erikj/stream_change_cache branch September 30, 2024 12:52
yingziwu added a commit to yingziwu/synapse that referenced this pull request Oct 17, 2024
No significant changes since 1.117.0rc1.

- Add config option `redis.password_path`. ([\#17717](element-hq/synapse#17717))

- Fix a rare bug introduced in v1.29.0 where invalidating a user's access token from a worker could raise an error. ([\#17779](element-hq/synapse#17779))
- In the response to `GET /_matrix/client/versions`, set the `unstable_features` flag for [MSC4140](matrix-org/matrix-spec-proposals#4140) to `false` when server configuration disables support for delayed events. ([\#17780](element-hq/synapse#17780))
- Improve input validation and room membership checks in admin redaction API. ([\#17792](element-hq/synapse#17792))

- Clarify the docstring of `test_forget_when_not_left`. ([\#17628](element-hq/synapse#17628))
- Add documentation note about PYTHONMALLOC for accurate jemalloc memory tracking. Contributed by @hensg. ([\#17709](element-hq/synapse#17709))
- Remove spurious "TODO UPDATE ALL THIS" note in the Debian installation docs. ([\#17749](element-hq/synapse#17749))
- Explain how load balancing works for `federation_sender_instances`. ([\#17776](element-hq/synapse#17776))

- Minor performance increase for large accounts using sliding sync. ([\#17751](element-hq/synapse#17751))
- Increase performance of the notifier when there are many syncing users. ([\#17765](element-hq/synapse#17765), [\#17766](element-hq/synapse#17766))
- Fix performance of streams that don't change often. ([\#17767](element-hq/synapse#17767))
- Improve performance of sliding sync connections that do not ask for any rooms. ([\#17768](element-hq/synapse#17768))
- Reduce overhead of sliding sync E2EE loops. ([\#17771](element-hq/synapse#17771))
- Sliding sync minor performance speed up using new table. ([\#17787](element-hq/synapse#17787))
- Sliding sync minor performance improvement by omitting unchanged data from incremental responses. ([\#17788](element-hq/synapse#17788))
- Speed up sliding sync when there are many active subscriptions. ([\#17789](element-hq/synapse#17789))
- Add missing license headers on new source files. ([\#17799](element-hq/synapse#17799))

* Bump phonenumbers from 8.13.45 to 8.13.46. ([\#17773](element-hq/synapse#17773))
* Bump python-multipart from 0.0.10 to 0.0.12. ([\#17772](element-hq/synapse#17772))
* Bump regex from 1.10.6 to 1.11.0. ([\#17770](element-hq/synapse#17770))
* Bump ruff from 0.6.7 to 0.6.8. ([\#17774](element-hq/synapse#17774))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants