[message-tags] Describe a JSON encoding format and bump version to 0.3 #318

jwheare · 2017-08-23T14:51:18Z

To allow for tag values containing complex typed data such as lists and associative arrays, new encoding rules are required. JSON was chosen as an existing, widely-adopted format with established encoding rules.

This is implemented on IRCCloud team servers. Let me know if you need access to the test team.

This grew out of discussion around various client tag specs (e.g. this +draft/emoji idea) relying on per-value JSON encoded data. It was suggested that instead we allow the entire tagset to be encoded with JSON if such features are desired.

Still need to resolve how best to deal with size limits. A few issues:

Checking that tags being added to a message fall within the limits will get different size values depending on the encoding used. JSON will be larger.
Having to check limits for server-added and client-provided tagsets separately is a bit more complex, because you have to json encode each tagset individually.
Combining two json tagsets into one, you drop the adjacent braces and add a comma, rather than just merging with a semicolon. This is probably not an overflow issue because it only means that your max data allowance is smaller by two characters, but we should probably mention this somehow.

One solution might be to just always check the limit based on the json encoding, even if you send tags with the standard encoding.

DarthGandalf · 2017-08-23T15:34:06Z

core/message-tags-3.3.md

+A message with the same tags sent with both the normal and JSON encodings for comparison:
+
+    @tag=value;+client-tag=value\swith\sspaces PRIVMSG #channel :Message
+    @{"tag":"value","+client-tag":"value\swith\spaces"} PRIVMSG #channel :Message


I don't see \s as valid escape for space in JSON (https://www.ietf.org/rfc/rfc4627.txt)

I believe \s is supposed to be so that we can use spaces inside of this JSON, while still being able to use the wire-like format.

Yeah, see this part:

the entire encoded JSON data MUST be escaped according to existing tag value escaping rules

So {"tag":"\" \";"} becomes @{"tag":"\\"\s\\"\:"}? I don't see why it needs to be so complicated and why it need to escape so much.
@{"tag":"\"\u0020\";"} looks simpler and is within JSON spec.

Sure, that could work, space inside a string is the only thing that strictly needs to be escaped. I just though it would be simpler to describe and implement to just reuse the existing tag escaping routine that will already be available.

Encode: '@' + tag_escape(json_encode_strip_whitespace(input_data)) + ' ' + rest_of_line
Decode: json_decode(tag_unescape(split_between_at_and_space(line)))

But yeah, you could probably replace tag_escape with replace_all(' ', '\u0020') and skip the tag_unescape, just means more description in the spec and implementing an extra encode step (even if it's a simple one).

If you're going to specify JSON, then the data ends when the JSON structure ends.

Escaping the " " is not necessary.

Escaping the " " is a requirement for a lot of wire-format parsers. Besides, I didn't write my JSON decoder. Why should I have to patch my JSON decoder to "know" that anything past the last } is actually not part of the JSON?

The low-level IRC line/packet parser must not depend on knowledge of the high-level tag value formats (the principle of protocol layering); that's the only way it can remain reliable and future-proof.

I just noticed that the example is "value\swith\spaces" should be correct when it's "value\swith\sspaces"

jwheare · 2017-08-24T11:44:44Z

Some authors are concerned that bundling JSON support to this spec as a hard requirement might hinder its adoption. One earlier suggestion in IRC was that we could make it optional, with a CAP value like message-tags=formats=json.

However in that case, as an implementor, I'd rather just go back to using arbitrary JSON on a tag by tag basis as it would be more universally supported by servers, so I think that's kind of a non-starter.

Some pros and cons on this so far (not all equivalent):

Pro:

"Cleaner" and more efficient than clients encoding/decoding JSON individually for the tags that need it
Starts the process of thinking about and resolving issues that might impact enabling a full JSON wire format, but in a scope-limited way. Might ease transition burden down the line.

Con:

Dependency burden. Depends on the language and target architecture how much of a problem this is.
Internal tag storage would require an arbitrary map data type, rather than a string -> string/null structure. This is more of a problem for lower level languages like C
Larger byte size for tags, whether or not they need JSON data types or not, complicates size limit checking.

jwheare · 2017-08-29T12:06:30Z

core/message-tags-3.3.md

+
+To detect JSON-encoded tag data, clients and servers MUST check for `'{'` (0x7B) as the first character of tag data, after the leading `'@'` (0x40) character. The JSON data ends on the first space `' '` (0x20) character.
+
+If JSON tag data fails to decode correctly, clients MUST abort message parsing and discard the entire message, as this may be a symptom of incorrectly applied escaping rules, making the remainder of the message vulnerable to command injection. Servers MUST ignore incorrectly encoded JSON data entirely and MAY terminate the client connection as a result.


Anyone have any comments on this? Technically, the injection risk here would also be an issue for incorrectly implemented standard tag encoding, but there isn't really a way to detect it in the same way as with a json parse error.

Also, should the server behaviour be defined more strictly here? Should there be an error numeric for invalid tag data? Should the connection terminate option be a MUST? Should we allow servers to accept the message but just ignore the tag part?

Making the termination "MUST" doesn't make much sense to me. IRC discards invalid messages right now as well, so why should we make it terminating the connection?

Ignoring parts of a message doesn't make much sense to me as well, because what if the tag is necessary? If we can't get the entire context of something, we shouldn't do anything with it. Otherwise, we may run into unneeded security issues.

So do you think the server should silently ignore the whole message or respond with an error numeric?

I would go for a numeric. When we think about regular IRC messages right now, you would get 421 ERR_UNKNOWNCOMMAND or similar. So we should maybe provide one for "invalid message format"?

I don't think an existing numeric exists that could be used for this. Should we define a text error response as have started to be proposed like ERR and WARNin e.g. chathistory #292? The difference being the chathistory error responses are scoped by a namespace.

These unresolved questions are partly why I added the "may terminate" option. Also, if a client is sending clearly badly encoded junk at this level of the protocol, it's probably not recoverable, so an error response is of questionable use.

That said, a simple solution could just be to reuse 421 ERR_UNKNOWNCOMMAND although it's not necessarily the most descriptive or accurate. Possibly with a more descriptive message parameter than Unknown command?

Oh, the 412 was only an example how a similar event is handled by RFC1459.

It wasn't intended to use it for this event. I would go for an own numeric here. If it should be namespaced or not, I'm not sure.

The "MAY terminate" is fine. But I wouldn't force it. This would possibly lead to DOS scenarios where people send a good request to another client, that replies with broken client tags and get disconnected.

Yes I understood your comment, but we don't really have a good solution for numeric exhaustion right now so there needs to be a really strong reason to define new one. That's why I think it would be the easiest to just reuse 421 here, it's what servers already do when you try to feed them invalid stuff.

I'm not particularly concerned about DOS scenarios when this would be a symptom of a fatally broken client.

All of that said, going back to your previous statement against just ignoring the tag part:

Ignoring parts of a message doesn't make much sense to me as well, because what if the tag is necessary? If we can't get the entire context of something, we shouldn't do anything with it. Otherwise, we may run into unneeded security issues.

Isn't that already a problem for the standard tag encoding? If you make an error encoding tags, current server implementations will just ignore the invalid tag data, even if it's "necessary", so do we really need to be more strict for json tags?

Decided to just change this to let invalid json be ignored entirely. Significantly simplifies implementations and avoids a few of the issues mentioned here.

jwheare · 2019-01-03T15:06:05Z

The difference in data length if you encode with JSON or traditional key values follows the rule:

Traditional: key=val; (len(key) + len(val) + 2[=;]) x tags - 1[;]
JSON: {"key":"val",} 2 + (len(key) + len(val) + 6[":,]) x tags - 1[,]

So JSON will always be 2 + (4 x tags) larger than trad.

For a pathological tagset that consists of 1024 single char key/vals, you get

Trad: (1 + 1 + 2) x 1024 - 1 = 4095 (within current limit)
JSON: 2 + (1 + 1 + 6) x 1024 - 1 = 8193 (just over 2 x trad)

So, one way to deal with this might be to have hard and soft limits for parsing tags. Treat e.g. 4096 as the soft limit when encoding tagsets and 8194 as a hard limit beyond which you can respond with ERR_INPUTTOOLONG as a server or ignore the data as a client.

This allows for valid within-limit trad tagsets to be re-encoded as JSON and not result in errors or dropped data.

jwheare · 2019-01-03T15:07:01Z

Could any server authors who've looked at this provide any feedback on the dependency issues with bundling JSON here?

dequis · 2019-01-03T18:45:59Z

Personally I'm less concerned about server code as I am with client code, which has way more installs on more diverse systems. C/C++ dependency management is just plain garbage and the burden of getting it right is all on the user, which is why we're so dependency averse. It's tempting to vendor libraries, but the ones that are small enough to do so are less properly tested/maintained/fuzzed.

That said, I may be ignoring some issues real servers may have, as none of the "server" code I work with is concerned about scalability/predictability of allocations/etc.

It just doesn't seem worth it to have this thing as a requirement for the whole tags section. JSON on a tag-by-tag basis is simpler, easier to adopt, and both approaches have to deal with the message tags escaping layer anyway.

I'd also be happy with a full-JSON wire protocol (as alternate port / ALPN / websocket only / STARTJSON / something else), but I feel like this spec is in an awkward middle position where it doesn't simplify any implementation and doesn't get us more flexibility than just putting it in the values.

nomis · 2019-01-03T20:14:45Z

"Internal tag storage would require an arbitrary map data type, rather than a string -> string/null structure." is the biggest nuisance of doing this. The server must be able to filter out non-client tags. It's not practical to avoid this either because (depending on future tag specifications) the server may have to store tags when parsing and then do some other processing on them later when sending them to other clients.

RyanSquared · 2019-01-03T20:15:53Z

I'd also be happy with a full-JSON wire protocol (as alternate port / ALPN / websocket only / STARTJSON / something else), but I feel like this spec is in an awkward middle position where it doesn't simplify any implementation and doesn't get us more flexibility than just putting it in the values.

I feel like that's one of the biggest points. Having full JSON would make parsing messages much cleaner (and IIRC people are working on this), and I think a middle ground could lead to awkward implementations in the future.

nomis · 2019-01-03T20:22:02Z

I feel like that's one of the biggest points. Having full JSON would make parsing messages much cleaner (and IIRC people are working on this), and I think a middle ground could lead to awkward implementations in the future.

Could we define an alternative tag/value separator (other than =) that explicitly means "the value is JSON encoded"? Then if the outer protocol happens to be JSON you can avoid double encoding those values and just insert them in place.

jwheare · 2019-01-03T20:47:27Z

Could we define an alternative tag/value separator (other than =) that explicitly means "the value is JSON encoded"? Then if the outer protocol happens to be JSON you can avoid double encoding those values and just insert them in place.

This isn't necessary or workable. The individual tag specs will tell you whether the value is JSON encoded or not. And normal tag escaping is still needed because you can mix JSON and non-JSON tags together, and we don't want to break the "split on space and semi-colon" parsing method.

jwheare · 2019-01-03T20:48:44Z

There is no consensus for this change, so I'm closing this issue and have opened a new one (#350) with the non-JSON wording changes.

nomis · 2019-01-03T22:18:11Z

This isn't necessary or workable. The individual tag specs will tell you whether the value is JSON encoded or not.

The individual specs will, but that's no use to a server implementation that wants to output a pure JSON protocol. It doesn't know if the incoming tag value should be output as a string or embedded JSON.

jwheare · 2019-01-03T22:23:22Z

Sure it can, if it looks at the key name and maintains a list of jsonny tags. I realise I misinterpreted your comment as referring to escaping though, sorry.

nomis · 2019-01-03T22:37:07Z

a list of jsonny tags

For all client tags? That's not practical for server software to do.

This could be as simple as using { instead of = to separate the tag name from the tag value (and imply the start of JSON data). (No change to the escaping behaviour.)

e.g. @+a{"b":"c"};+d=e

This would make JSON versions of the protocol more efficient at handling tags with JSON data without much impact on the current protocol.

jwheare · 2019-01-04T11:23:26Z

For all client tags? That's not practical for server software to do.

Yeah fair enough.

I'm just not sure it's that simple. JSON can start with a [ too.

And I think you can just specify this instead of changing the protocol. For client tags, the client will be generating the tags. So if you're using a full JSON protocol, it can be specified that you can encode tags as complex JSON objects, and the other side can easily detect whether it needs to decode it twice or not. If it's a string, decode it again, otherwise leave it as is.

grawity · 2019-01-04T11:44:15Z

This could be as simple as using { instead of = to separate the tag name from the tag value (and imply the start of JSON data). (No change to the escaping behaviour.)

e.g. @+a{"b":"c"};+d=e

IMHO that actually makes parsing harder – you can't trivially key, val = item.split("=", 1) anymore.

nomis · 2019-01-04T12:31:54Z

I'm just not sure it's that simple. JSON can start with a [ too.

Ok, so use some other symbol and don't try to start the value with the delimiter.

e.g. @+a^{"b":"c"} or @+a^["b","c","d"]

For client tags, the client will be generating the tags.

The client using an all JSON protocol will be, but there may be remote clients that are not. Those will be sending you tags with JSON encoded values that the server must double encode to pass to its client.

IMHO that actually makes parsing harder – you can't trivially key, val = item.split("=", 1) anymore.

You already can't do that because there can be a = in the tag value...

DarthGandalf · 2019-01-04T12:39:09Z

You already can't do that because there can be a = in the tag value...

But not in key. That's what maxsplit=1 is for in key, val = item.split("=", maxsplit=1)

[message-tags] Describe a JSON encoding format and bump version to 0.3

22dca7a

jwheare added client-tags protocol tags Unresolved decisions labels Aug 23, 2017

jwheare added this to the Roadmap milestone Aug 23, 2017

DarthGandalf reviewed Aug 23, 2017

View reviewed changes

jwheare commented Aug 29, 2017

View reviewed changes

jwheare added 2 commits August 29, 2017 15:04

Fix typo

368f1c0

Simplify behaviour of invalid json data, just treat it as empty tags

718d297

jwheare closed this Jan 3, 2019

jwheare mentioned this pull request Jan 3, 2019

[message-tags] Clarity and wording tweaks #350

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[message-tags] Describe a JSON encoding format and bump version to 0.3 #318

[message-tags] Describe a JSON encoding format and bump version to 0.3 #318

jwheare commented Aug 23, 2017 •

edited

Loading

DarthGandalf Aug 23, 2017

RyanSquared Aug 23, 2017

jwheare Aug 23, 2017

DarthGandalf Aug 23, 2017

jwheare Aug 23, 2017

nomis Aug 23, 2017

RyanSquared Aug 23, 2017

grawity Aug 24, 2017

SISheogorath Aug 29, 2017

jwheare commented Aug 24, 2017 •

edited

Loading

jwheare Aug 29, 2017

SISheogorath Aug 29, 2017

jwheare Aug 29, 2017

SISheogorath Aug 29, 2017

jwheare Aug 29, 2017 •

edited

Loading

SISheogorath Aug 29, 2017

jwheare Aug 29, 2017

jwheare Aug 29, 2017

jwheare Sep 13, 2017

jwheare commented Jan 3, 2019 •

edited

Loading

jwheare commented Jan 3, 2019

dequis commented Jan 3, 2019

nomis commented Jan 3, 2019

RyanSquared commented Jan 3, 2019

nomis commented Jan 3, 2019

jwheare commented Jan 3, 2019

jwheare commented Jan 3, 2019 •

edited

Loading

nomis commented Jan 3, 2019

jwheare commented Jan 3, 2019

nomis commented Jan 3, 2019

jwheare commented Jan 4, 2019

grawity commented Jan 4, 2019

nomis commented Jan 4, 2019

DarthGandalf commented Jan 4, 2019


		To detect JSON-encoded tag data, clients and servers MUST check for `'{'` (0x7B) as the first character of tag data, after the leading `'@'` (0x40) character. The JSON data ends on the first space `' '` (0x20) character.

		If JSON tag data fails to decode correctly, clients MUST abort message parsing and discard the entire message, as this may be a symptom of incorrectly applied escaping rules, making the remainder of the message vulnerable to command injection. Servers MUST ignore incorrectly encoded JSON data entirely and MAY terminate the client connection as a result.

[message-tags] Describe a JSON encoding format and bump version to 0.3 #318

[message-tags] Describe a JSON encoding format and bump version to 0.3 #318

Conversation

jwheare commented Aug 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwheare commented Aug 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwheare Aug 29, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwheare commented Jan 3, 2019 • edited Loading

jwheare commented Jan 3, 2019

dequis commented Jan 3, 2019

nomis commented Jan 3, 2019

RyanSquared commented Jan 3, 2019

nomis commented Jan 3, 2019

jwheare commented Jan 3, 2019

jwheare commented Jan 3, 2019 • edited Loading

nomis commented Jan 3, 2019

jwheare commented Jan 3, 2019

nomis commented Jan 3, 2019

jwheare commented Jan 4, 2019

grawity commented Jan 4, 2019

nomis commented Jan 4, 2019

DarthGandalf commented Jan 4, 2019

jwheare commented Aug 23, 2017 •

edited

Loading

jwheare commented Aug 24, 2017 •

edited

Loading

jwheare Aug 29, 2017 •

edited

Loading

jwheare commented Jan 3, 2019 •

edited

Loading

jwheare commented Jan 3, 2019 •

edited

Loading