-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC2326: Label based filtering #2326
base: old_master
Are you sure you want to change the base?
Changes from 5 commits
3214e90
8c84d7b
cb0c68f
6627b00
32597a7
6f36f56
46d412e
78c4e16
05217cd
da7776f
158f11a
3a8f716
b080337
61f1396
d1110a2
45225af
f325203
a6d1249
a3450a6
7a21efd
4b7ca52
88c93dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,187 @@ | ||
# Label based filtering | ||
|
||
## Problem | ||
|
||
Rooms often contain overlapping conversations, which Matrix should help users | ||
navigate. | ||
|
||
## Context | ||
|
||
We already have the concept of 'Replies' to define which messages are responses | ||
to which, which [MSC1849](https://github.com/matrix-org/matrix-doc/pull/1849) | ||
proposes extending into a generic mechanism for defining threads which could (in | ||
future) be paginated both depth-wise and breadth-wise. Meanwhile, | ||
[MSC1198](https://github.com/matrix-org/matrix-doc/issues/1198) is an alternate | ||
proposal for threading, which separates conversations into high-level "swim | ||
lanes" with a new `POST /rooms/{roomId}/thread` API. | ||
|
||
However, fully generic threading (which could be used to implement forum or | ||
email style semantics) runs a risk of being overly complicated to specify and | ||
implement and could result in feature creep. This is doubly true if you try to | ||
implement retrospective threading (e.g. to allow moderators to split off | ||
messages into their own thread, as you might do in a forum or to help manage | ||
conversation in a busy chatroom). | ||
|
||
Therefore, this is a simpler proposal to allow messages in a room to be filtered | ||
based on a given label in order to give basic one-layer-deep threading | ||
functionality. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Still, the 'reply to an existing message' is an important use case. A message cannot be replied to unless it is already labelled. Regular users cannot add labels to messages they did not author. Should a (unique) label to otherwise unlabelled messages be required? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't that like replying to the message based on message ID? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #1849 might be better suited for this kind of thing since a client could directly request the message that is referenced... Not sure though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ptman I was thinking subsequent messages could reuse the existing label. Basically avoid nested threads. AFAICT #1849 proposes a strictly one-way relation (i.e. child points to parent). Wouldn't that make it expensive to list the thread starting from a particular message? Each subsequent message would have to be determined by reverse look-up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally prefer hierarchical threads, but either way, regardless of the how the relationships are recorded, they can be shown flat, just like apple mail and gmail do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, but I was under the impression that this proposal specifically aims for flat threads for the sake of simplicity. |
||
|
||
## Proposal | ||
|
||
We let users specify an optional `m.labels` field onto the events. This field | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
maps key strings to freeform text labels: | ||
|
||
```json | ||
{ | ||
// ... | ||
"m.labels": { | ||
"somekey": "somelabel" | ||
} | ||
} | ||
``` | ||
|
||
Labels which are prefixed with # are expected to be user-visible and exposed to | ||
the user by clients as a hashtag, letting the user filter their current room by | ||
the various hashtags present within it. Labels which are not prefixed with # are | ||
expected to be hidden from the user by clients (so that they can be used as | ||
e.g. thread IDs bridged from another platform). | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Clients can use these to filter the overlapping conversations in a room into | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A concern overall with this proposal is that once someone uses |
||
different topics. The labels could also be used when bridging as a hashtag to | ||
help manage the disconnect which can happen when bridging a threaded room to an | ||
unthreaded one. | ||
|
||
Clients are expected to explicitly set the label on a message if the user's | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "the" implies there's only one you have to worry about - shouldn't you be copying the whole set? |
||
intention is to respond as part of a given labelled topic. For instance, if the | ||
user is currently filtered to only view messages with a given label, then new | ||
messages sent should use the same label. Similarly if the user sends a reply to | ||
a given message, that reply should typically use the same labels as the message | ||
being replied to. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My remaining doubt is how we're expecting clients to expose adding a label initially, eg. if there would be another button in the composer or similar to add a label to a new message or whether you'd just let them be added retrospectively. Likewise, would we expect to show the labels on each message / show on hover etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added some clarifications on that point in 4b7ca52, but designing UI isn't my area of expertise so I'm happy to discuss it. |
||
|
||
When a user wants to filter a room to given label(s), it defines a filter for | ||
use with /sync or /messages to limit appropriately. This is done by new `labels` | ||
and `not_labels` fields to the `EventFilter` object, which specifies a list of | ||
labels to include or exclude in the given filter. | ||
|
||
### Encrypted rooms | ||
|
||
In encrypted events, the string used as the key in the map is a SHA256 hash of a | ||
contatenation of the text label and the ID of the room the event is being sent | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
to. Once encrypted by the client, the resulting `m.room.encrypted` event's | ||
content contains a `m.labels_hashes` property which is an array of these hashes. | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
When filtering events based on their label(s), clients are expected to use the | ||
hash of the label(s) to filter in or out instead of the actual label text. | ||
|
||
#### Example | ||
|
||
Consider a label `#fun` on a message sent to a room which ID is | ||
`!someroom:example.com`. Before encryption, the message would be: | ||
|
||
```json | ||
{ | ||
"type": "m.room.message", | ||
"content": { | ||
"body": "who wants to go down the pub?", | ||
"msgtype": "m.text", | ||
"m.labels": { | ||
"3204de89c747346393ea5645608d79b8127f96c70943ae55730c3f13aa72f20a": "#fun" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
`3204de89c747346393ea5645608d79b8127f96c70943ae55730c3f13aa72f20a` is the SHA256 | ||
hash of the string `#fun!someroom:example.com`. | ||
|
||
Once encrypted, the event would become: | ||
|
||
```json | ||
{ | ||
"type": "m.room.encrypted", | ||
"content": { | ||
"algorithm": "m.megolm.v1.aes-sha2", | ||
"ciphertext": "AwgAEpABm6.......", | ||
"device_id": "SOLZHNGTZT", | ||
"sender_key": "FRlkQA1enABuOH4xipzJJ/oD8fxiQHj6jrAyyrvzSTY", | ||
"session_id": "JPWczbhnAivenK3qRwqLLBQu4W13fz1lqQpXDlpZzCg", | ||
"m.labels_hashes": [ | ||
"3204de89c747346393ea5645608d79b8127f96c70943ae55730c3f13aa72f20a" | ||
] | ||
} | ||
} | ||
``` | ||
|
||
### Unencrypted rooms | ||
|
||
In unencrypted rooms, the string to use as a key does not matter (as this format | ||
is only kept for consistency with events sent in encrypted rooms) and clients | ||
are free to use any non-empty string they wish (as long as it's unique per label | ||
in the event). | ||
|
||
When filtering events based on their label(s), clients are expected to use the | ||
actual label text instead of the string key. | ||
|
||
#### Example | ||
|
||
```json | ||
{ | ||
"type": "m.room.message", | ||
"content": { | ||
"body": "who wants to go down the pub?", | ||
"msgtype": "m.text", | ||
"m.labels": { | ||
"somekey": "#fun" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Problems | ||
|
||
Do we care about internationalising hashtags? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generally the room will be using a specific language, so probably not. |
||
|
||
Too many threading APIs? | ||
|
||
Using hashes means that servers could be inclined to compute rainbow tables to | ||
read labels on encrypted messages. However, since we're using the room ID as | ||
some kind of hash, it makes it much more expensive to do because it would mean | ||
maintaining one rainbow table for each encrypted room it's in, which would | ||
probably make it not worth the trouble. | ||
|
||
## Alternative solutions | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was just rereading this proposal, I think @dkasak's points on the main thread are legitimate: that hashing the labels give a very false sense of security here. Given how strong our e2ee is, folks will assume opaque labels are actually encrypted, rather than just obfuscated by a hash which can be easily rainbow-tabled. Personally, I think it'd be fine to add a pepper to the hashed events, and require at the application level that for labels to work in encrypted rooms, the new user must be brought up to speed on the pepper (e.g. by the inviter sharing the pepper in an encrypted message, possibly to-device, after having invited them). This is simpler than using opaque IDs for the unencrypted event headers, as there's only one pepper that needs to be shared to new users, rather than the whole set of opaque->real label mappings. |
||
|
||
Instead of using hashes to identify labels in encrypted messages, using random | ||
opaque strings was also considered. Bearing in mind that we need to be able to | ||
use the label identifiers to filter the history of the room server-side (because | ||
we're not expecting clients to know about the whole history of the room, see my | ||
first point above), this solution had the following downsides, all originating | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
from the fact that nothing would prevent 1000 clients from using each a | ||
different identifier: | ||
|
||
* filtering would have serious performances issues in E2EE rooms, as the server | ||
would need to return all events it knows about which label identifier is any | ||
of the 1000 identifiers provided by the client, which is quite expensive to | ||
do. | ||
|
||
* it would be impossible for a filtered `/message` (or `/sync`) request to | ||
include every event matching the desired label because we can't expect a | ||
client to know about every identifier that has been used in the whole history | ||
of the room, or about the fact that another client might suddenly decide to | ||
use another identifier for the same label text, and include those identifiers | ||
in its filtered request. | ||
|
||
Another proposed solution would be to use peppered hashes, and to store the | ||
pepper in the encrypted event. However, this solution would have the same | ||
downsides as described above. | ||
|
||
## Unstable prefix | ||
|
||
Unstable implementations should hook up `org.matrix.labels` rather than | ||
`m.labels`, and `org.matrix.labels_hashes` rather than `m.labels_hashes`. When | ||
defining filters, they should also use `org.matrix.labels` and | ||
`org.matrix.not_labels` in the `EventFilter` object. | ||
|
||
Additionally, servers implementing this feature should advertise that they do so | ||
by exposing a `label_based_filtering` flag in the `unstable_features` part of | ||
babolivier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
the `/versions` response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Bubu
Agreed that this would be a handy feature, and that listing all labels in a room is a nice thing to have generally. I'm thinking of one possible way to do this, which would be to add an endpoint that exposes the list of labels the server knows have been used in the room (which should be fairly easy given the server will probably already store
(event_id, label)
tuples in its database for efficiency). For encrypted rooms, this would return a list of hashes (which is what the server considers as a list of labels for that room, since it doesn't know about the actual labels), and clients would then be able to resolve those hashes by calling/messages
with a filter containing the labels to resolve, and extracting the labels from the response (which contains events that the client should be able to decrypt). This would allow such a feature to work well without having to leak more metadata.wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
As for metadata concerns, these are still there, as we are working on a set of ~6 fixed labels for this usecase. But this is basically already covered in the "Security Considerations" section here.
Whether or not the actually usage of these tags for images/links/etc. will become optional in E2EE chats is not part of this MSC I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we resolve this thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that once I've updated the MSC to describe this solution, which I haven't got time to do yet.