Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4171 Omit service members from room summary #17866

Closed
wants to merge 23 commits into from

Conversation

Half-Shot
Copy link
Member

@Half-Shot Half-Shot commented Oct 24, 2024

For #17859

This applies the rules for MSC4171 to the m.heroes response in sync. For clarity, this PR is needed to fix behaviors in some modern clients such as EX which use heroes to determine room names rather than local calculation.

MSC discussion for excluding service members from m.heroes: matrix-org/matrix-spec-proposals#4171 (comment)

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@Half-Shot Half-Shot changed the title Hs/msc4171 heroes MSC4171 Omit service members from room summary Oct 24, 2024
@Half-Shot Half-Shot marked this pull request as ready for review October 24, 2024 14:32
@Half-Shot Half-Shot requested a review from a team as a code owner October 24, 2024 14:32
tests/storage/test_roommember.py Outdated Show resolved Hide resolved
tests/storage/test_roommember.py Show resolved Hide resolved
Comment on lines +321 to +323
exclude_users_clause, args = make_in_list_sql_clause(
self.database_engine, "state_key", exclude_members, negative=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be careful about there being too many exclude_members and bailing to prevent DoS.

Alternatively, we could just fetch N extra members like we do in case one of them is the calling user and do the exclusion outside of the SQL. But we should probably have a limit on that as well. Perhaps something to clarify in the spec and then we can bail if the list is longer than the specc'ed max. Perhaps we just rely on max length of an event?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to rely on the max length of an event, which I think even if you had a long list of very short userIds would still only be up to about 9k or so (once you've included the usual event padding).

I'm not quite sure on the performance cost here, but I'd assume that a 9k string list filter in postgres isn't terrible as it's not going to impact IO.

Alternatively we could do as you say and set a sensible max number of users in the spec (say 100 or so). I'm generally a bit allergic to limitations in the spec, as someones probably going to come up with a use case of 101 members however it might be justified in the case of performance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and get_room_summary has big fat cache on it so it's probably okay to do a slightly more expensive call here. Admittedly this does impact the hot path of sync, but I think the operation of pulling out excluded users is fairly fast.

Copy link
Contributor

@MadLittleMods MadLittleMods Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to limit make_in_list_sql_clause(...) to at-most 1000 (see usages of batch_iter(..., 1000)) but it's a bit tough to do here with the negative=True condition.

We should at-least add a comment here that we are assuming that there should be no more than 9.3k members to exclude based on the max length of an event (65535 bytes).

  • There is nothing stopping someone from just using a single letter string over and over in the list so it could be practically ~65k things in the list.
  • With actual MXIDs: 16.4k = 65535 / 4 (@m:h) maximally
  • In the most likely maximal scenario with public federation: 10.9k = 65535 / 6 (@m:h.io)
  • With enough unique combos in localpart: 9.3k = 65535 / 7 (@mm:h.io)

We could have a valid MXID check and add them to a Set to deduplicate but I don't think it's worth the extra computation.

Perhaps we should just have a practical limit to re-evaluate if someone hits it. Have a 1k check and set exclude_members = [] in that case with a log warning (warning instead of assert because we don't want to break the whole /sync response). This can always be increased in the future when someone has a practical use case but avoids rooms where there only goal is performance abuse.

if isinstance(functional_members_data, list) and all(
isinstance(item, str) for item in functional_members_data
):
exclude_members = functional_members_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit dubious to change the behavior of /sync (m.heroes) without it being under an unstable endpoint or key in the response. It is under a feature flag (msc4171_enabled) 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, unsure how best to do this one. I felt it would be better for the homeserver to uniformly provide the same result across all users / endpoints based on the configuration, than adding query parameters to each endpoint and filtering. It's a reasonable request though, we could do it that way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I didn't manage to add a feature flag check here in the original PR. Have updated.

Copy link
Contributor

@MadLittleMods MadLittleMods Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be prudent to separate this behavior, either via a new key in the /sync response as io.element.msc4171.heroes or conditional based on a new query parameter (any good prior art?)

New endpoint(s) is also a valid option but not worth how cumbersome that would be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query parameter would be probably best I think. I'll just need to let the EX crew know that this needs adding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response. I do think we need some review on the MSC as it's a naturally ripe-for-abuse feature. So from my view, this feature primarily helps with DM rooms where no room name nor alias is set and you will have around 3 participants (Alice, Bob, and a bot).

Deprioritise service users to the end of the join list (this doesn't need to be behind a flag)

We can do this, but unfortunately really only helps with larger groups with no other metadata to gather a name from.

Have clients remove service users from heroes manually

We can do this, but it does present a couple of problems. First, it means we need to fetch state as part of the sync loop which either means a small delay or a flicker as room names adjust. I am unsure how palatable that will be, but it's an option.

Otherwise, I'm happy adding a new query param flag to enable filtering of m.heroes.

I think this would be my preference. This avoids changing behaviors for existing clients, and gives clients the opportunity to opt out. It also prevents the flicker problem noted above which is my primary concern with the client approach. I'll proceed with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this, but it does present a couple of problems. First, it means we need to fetch state as part of the sync loop which either means a small delay or a flicker as room names adjust. I am unsure how palatable that will be, but it's an option.

I don't really follow this? Clients using sync v2 will get the service state event alongside the heroes, and for sliding sync the client can add it to the list of requested state events.

We can also server side annotate the hero membership events in sync v2 to indicate that they are service users (and do something similar in sliding sync), but given the client already has that information it feels a bit redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the point of m.heroes was to calculate a room name when you don't have the full state locally? Admittedly I didn't know that you could ask for specific state events to appear alongside. That seems like a more sensible approach to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, we always send down all the state to the clients, except for membership events if they have enabled "lazy-loading" for sync v2. In sliding sync the client can specify what state event types it wants the server to return (and gives an option for lazy loading members).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes things much easier. I think the optimal path here is probably to scrap this PR and have the clients pull the correct state and filter then.

changelog.d/17866.feature Outdated Show resolved Hide resolved
ORDER BY
CASE membership WHEN ? THEN 1 WHEN ? THEN 2 WHEN ? THEN 3 ELSE 4 END ASC,
event_stream_ordering ASC
LIMIT ?
"""
""" % (exclude_users_clause)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use f-strings, or the more modern .format(..).

The old % is a bit of a footgun, as e.g. its meant to take a collection like a tuple, but you've actually just passed it a plain string (since you forgot a comma), which is valid as a string is a collection of strings. This does work, as python does magic to detect the case, but its a bit ugh.

@Half-Shot
Copy link
Member Author

I think on balance and after talking to @erikjohnston, we have the ability to do this filtering in the client and it sounds trivial to do so from both v2 and sliding sync. Let's do that, and if that ends up being annoying then we can circle back to a backend implementation.

@Half-Shot Half-Shot closed this Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants