-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement decoding of stringref (tag 25 & 256) #71
Conversation
per <http://cbor.schmorp.de/stringref> To operate correctly this must be implemented in the decoder, because special nestable context is established by tag 256 for use by tag 25. Fixed test in cases.js which accidentially used tag 256 improperly. Added tests in decoder.ava.js
I've now got the coverage % increased rather than decreased. |
It was generating errors on some versions of Node.js. Probably promise related.
Add tests of invalid stringref inputs
OK, this is ready for a code review. Decode support only (so far.) |
per <http://cbor.schmorp.de/stringref> To operate correctly this must be implemented in the decoder, because special nestable context is established by tag 256 for use by tag 25. Fixed test in cases.js which accidentially used tag 256 improperly. Added tests in decoder.ava.js
It was generating errors on some versions of Node.js. Probably promise related.
Add tests of invalid stringref inputs
OK, I've read the spec you linked to, and the parts I understand I disagree with almost completely:
My alternate proposal (which I did code up enough to prove that it works, at one point):
as with your suggested approach, you need two passes through the input to calculate the best compression, but that's straightforward:
All of the complexity is in the encoder, and that complexity can be amortized over multiple runs if you like. |
However, if multiple people want this (thumbs-up the original PR comment to indicate), I'll merge it anyway. |
@hildjj I'm not the originator of the tag 25/256 spec, but it's been an acknowledged extension of CBOR since RFC 7049 was published. Like you, I could suggest improvements to it, but thought I'd start with an implementation based on that existing tag 25/256 spec, rather than try to overload tag 25 with multiple meanings. Annother alternate approach, using tag 28/29 on strings does something roughly equivalent to the alternate you propose. Plus it would allow smart one-pass encoding by tagging only those strings which are already interned by JavaScript. (This would cover most object fields since any string appearing in JavaScript code automatically gets interned.) Tag 28/29 is also is also how you would address the non-hierarchal data problems (objects references and cycles), and the decode logic could be shared, but good encoding of references/cycles is a more complex problem than interned strings, shared objects will generally require two passes. Both tag 25/256 and 28/29 are a bit different from other tags in that they really need to be built into to core encoder/decoder logic. Neither seems perfect, but some experience implementing and using them might lead to a better alternative than either. But all that seems like subject for a separate design after experimenting some with the existing ones that Mark Lehmann already drew up. Dynamic space-optimized CBOR would probably be best addressed by a gzip postprocessing, but the impact on complexity and speed isn't that attractive to me right now, and even with gzip the value/string sharing would probably still improve compression. |
Till you make up your mind about this, I'm going to postpone implementing the encoder changes. Let me know what you think about the tag 28/29 concept. |
If you still want this, the best thing to do is to create a plugin package like cbor-bigdecimal. I'll even take it as a patch to this repo so you don't have to publish and maintain it separately. I'm going to close this PR, however, since that changeset will be pretty drastically different than this one. Please submit a new PR with the plugin approach if you like. |
Implement decoding stringref tags (tag 25 and 256) per http://cbor.schmorp.de/stringref
Stringref tags can be used to substantially compress CBOR streams containing repeated string values, such as map entry names. This change includes updates to the decoder so that streams encoded using stringref tags will be decoded correctly. I'll submit an encoding change separately.
Fixed test in cases.js which accidentally used tag 256 improperly. Changed this case to use the (currently unassigned) tag 1000.
Added tests in decoder.ava.js
To operate correctly this must be implemented in the decoder, because special nestable context is established by tag 256 for use by tag 25 during decoding. If no stringref-namespace (tag 256) is encountered in an stream, this change has no effect on decoding.
Ref #70