Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2244: Mass redactions #2244

Merged
merged 10 commits into from
Nov 12, 2019
78 changes: 78 additions & 0 deletions proposals/2244-mass-redactions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Mass redactions
Copy link
Contributor

@ShadowJonathan ShadowJonathan May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who wishes to implement this in Ruma, I'm missing a few key details from this proposal;

  1. This alters the format of redactions, currently matrix has a dedicated endpoint for redactions themselves, which takes the redacted event ID in the path.
    For mass redactions, shouldn't there be a new endpoint that'd allow the new format? To allow giving the event IDs to redact as a POST body? (Or the likes)
    Currently, there is no way (other than sending a custom event) to take advantage of this new redaction format, and even then it is not fully clear if sending it as a custom event is the intended way to mass-redact.
  2. There is not an example of the new redaction event, while this is minor, it is still crucial to make clear/explicit what the new format will be, especially given historical context surrounding redacts keys.
  3. It's not clear if the old redaction format (with a string as value) will persist alongside the new version, or if there will be a clear split on room versions.
    I'd like to note that, if both variants will persist, this'll be (probably one of) the only places in the matrix spec where a key can have multiple value types, this can possibly create complications and exceptions in (de)serialization libraries such as Ruma, in which I don't see a very clear path for serde to take this on. (Update, serde seems to handle this nicely through untagged) I'd like some clarity on that as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Sending a custom event is the intended way, a custom endpoint wouldn't have any advantages.

  2. Federation:

    {
      "type": "m.room.redaction",
      "content": {
        "redacts": ["$1234", "$5678", "$9abc"]
      }
    }

    Client API (with backwards compatibility):

    {
      "type": "m.room.redaction",
      "content": {
        "redacts": ["$1234", "$5678", "$9abc"]
      },
      "redacts": "$1234"
    }
  3. The MSC2174 format will hopefully never be used anywhere, but either way only one format will be used in new room versions. In the C-S API, servers should add the top-level redacts key for backwards compatibility, but that's just locally and not in the actual federation event (and that field is always a string, never an array).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Re 3: So this means that the /redact/ API will spit out a one-item-array redaction event in new room versions, as described above? I'd leave it for a spec PR to clear up if the MSC2174 format will be used in new room versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShadowJonathan for extreme clarity: we kindly ask that this MSC is not implemented at this time, please. The SCT is aware of this MSC's state and wants to fix it, but there's external considerations which need to be made before implementation can begin.

Matrix, like any platform with public chat rooms, has spammers. Currently,
redacting spam essentially requires spamming redaction events in a 1:1 ratio,
which is not optimal<sup>[1](images/2244-redaction-spam.png)</sup>. Most
clients do not even have any mass redaction tools, likely in part due to the
lack of a mass redaction API. A mass redaction API on the other hand has not
been implemented as it would require sending lots of events at once. However,
this problem could be solved by allowing a single redaction event to redact
many events instead of sending many redaction events.

## Proposal
This proposal builds upon [MSC2174](https://github.com/matrix-org/matrix-doc/pull/2174)
and suggests making the `redacts` field in the content of `m.room.redaction`
events an array of event ID strings instead of a single event ID string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having thought about this some more, I think one of the most common use cases for mass redactions is to handle brigading scenarios where a user joins a room solely in order to post abuse. I think we should bite the bullet and consider putting user_ids as well as event_ids in the redaction target to make it easier for moderators to handle this scenario.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, although that should be a separate proposal. Finding and redacting everything from a user probably needs some new auth rules that aren't related to just redacting many events at once.

richvdh marked this conversation as resolved.
Show resolved Hide resolved

It would be easiest to do this before MSC2174 is written into the spec, as then
only one migration would be needed: from an event-level redacts string to a
content-level redacts array.

### Backwards compatibility
There is no easy way to stay fully compatible with *older* clients, so the
proposed solution is to not support them. In order to not break old clients
completely, servers should still add a `redacts` string containing one of the
redacted event IDs to the top level of `m.room.redaction` events in *newer*
room versions when serving such events over the Client-Server API.

Like MSC2174, for improved compatibility with *newer* clients, servers should
add a `redacts` array to the `content` of `m.room.redaction` events in *older*
room versions when serving such events over the Client-Server API.

### Number of redactions
Room v4+ event IDs are 44 bytes long, which means the federation event size
limit would cap a single redaction event at a bit less than 1500 targets.
Redactions are not intrinsically heavy, so a separate limit should not be
necessary.

Due to the possible large number of redaction targets per redaction event,
servers should omit the list of redaction targets from the `unsigned` ->
`redacted_because` field of redacted events. If clients want to get the list
of targets of a redaction event in `redacted_because`, they should read the
`event_id` field of the `redacted_because` event and use the
`/rooms/{roomId}/event/{eventId}` endpoint to fetch the content.
KitsuneRal marked this conversation as resolved.
Show resolved Hide resolved

### Client behavior
Clients shall apply existing `m.room.redaction` target behavior over an array
of event ID strings.

### Server behavior (auth rules)
The target events of an `m.room.redaction` shall no longer be considered when
authorizing an `m.room.redaction` event. Any other existing rules remain
unchanged.

After a server accepts an `m.room.redaction` using the modified auth rules, it
evaluates individually whether each target can be redacted under the existing
room v5 auth rules. Servers MUST NOT include failing and unknown entries to
clients.

> Servers do not know whether redaction targets are authorized at the time they
receive the `m.room.redaction` unless they are in possession of the target
event. Implementations retain entries in the original list which were not
shared with clients to later evaluate the target's redaction status.

When the implementation receives a belated target from an earlier
`m.room.redaction`, it evaluates at that point whether the redaction is
authorized.

> Servers should not send belated target events to clients if their redaction
was found to be in effect, as clients were not made aware of the redaction.
That fact is also used to simply ignore unauthorized targets and send the
events to clients normally.

## Tradeoffs

## Potential issues
KitsuneRal marked this conversation as resolved.
Show resolved Hide resolved

## Security considerations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest concern here is presumably that all the servers in the room could have to do quite a lot of work, churning through all the listed event IDs trying to redact them, with the associated cache impact etc depending on the implementation specifics. I suggest we at least mention that the implementation should process the redactions in the background in such a way to avoid large malicious redaction events from being an easy way to DoS a server.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rapid fire of individual redaction events is not really better in this regard, is it? Events can be rate-limited, and megaredactions can be rate-limited more aggressively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to flesh this out further: i discussed this with @erikjohnston back in Sept whilst we were looking at bulk redaction in riot as a quick fix. he had concerns that this makes it too easy to overload a homeserver with single requests which could require a lot of processing to service - which is why we ended up implementing “bulk redact by looping over /redact” in riot-web at the time instead.

however, I am not convinced - as kitsune says, the server can ratelimit processing a bulk redaction as it would individual ones. it also shifts the onus to the server rather than the client to trickle through the redactions (simplifying the client UX massively, given it doesn’t have to handle queuing and echo of the redactions). Also, redacting individually means that tonnes of redaction msgs end up in the DAG and have to be relayed through to all the listening clients via /sync, which puts much more load on the CS API (and bandwidth and processing on the clients) than syncing a single bulk-redact event.

So, on aggregate, I still think this MSC is a big improvement. @erikjohnston do you still have concerns?

Server implementations should ensure that large redaction events do not become
a DoS vector, e.g. by processing redactions in the background.
Binary file added proposals/images/2244-redaction-spam.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.