-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement request: (owned) key/string value deduplication #1303
Comments
Hi Ewald, This feature that I call "string interning" has been on my road map for years. Best regards, |
Hello Benoit,
Code-wise both approaches are very dense (less than 100 bytes) and can be compiled out. They don't increase RAM space either. On v5, I implemented it as an argument to
Fully understand it requires a major version number. I was only trying to port my v5 version and understand whether it's a useful idea. |
I like the idea of blindly scanning the memory pool as you did in your code. However, we can probably use a Boyer-Moore algorithm to speed up the search. |
Ah, I was not aware of Serialized() (did it exist in v5?) 😌. Boyer-Moore is an excellent idea, had looked to that since it's part of the std library in C++17, just lacked the time to code it. |
As promised the result of the performance impact tests for v6 on ESP8266 12E, 80MHz clock, using simple subtraction of millis() calls before and after (includes time lost due to WDT/yield() etc.). Average of 100 runs.
So there is a significant ~75% performance hit on deserialize (but nowhere else) for a (mere) ~25% memory gain. However, ~50ms to parse the largest possible JSON string that fits in memory for an ESP12E is a very acceptable figure, hence why I decided to implement it in the most simple way (more accurately: use the lazy approach). If there is interest, I am happy to share some of the other enhancement ideas implemented on v5 and see if I can port them to v6 (as a proof-of-concept):
These are things I needed for my application where (IOT) Json strings can up be ~60K in size e.g. when communications get interrupted and the master ESP can not read out the slave ESP in time. ArduinoJson (v5 at the time) was by far the most modular object oriented implementation where it was possible to modify/enhance classes without breaking the whole code. Will need to do some work on documentation and my C++ skills. As a hardware designer/kernel programmer (C), my code often looks more like a Linux kernel 😞 |
Finally managed to find some time to port my v5 enhancements to v6. It serializes/deserializes correctly a very large JSON string. On Windows 10 64 bit (and ESP) the JsonDocument size is about 35% smaller without string dedup (basically the VariantSlot structure is half the size) and 50%+ smaller with key/string value dedup. There is a main.cpp test program You can find the code here.
Not (yet) ported:
In principle all v6.x functions should work without changes.
To do
It's merely a proof-of-concept to see if some enhancements created for v5 could be made to work on v6, a master piece of cpp code, extremely flexible/extendable but a bit more complex to change at the core for a C/kernel programmer. The major driver for these enhancements is really the struggle with small RAM size of ESP/Arduino systems for IOT application, hence the focus on reducing memory and store more strings/data in flash and/or shared storage pools. Creating a relocatable memory pool and string/key dedup seemed to achieve to most bang for the buck. Hope this gives some enhancement ideas for future versions and how these could be potentially implemented. |
Hi Ewald, Thank you very much for sharing these improvements 👍 However, this thread has become a potpourri of improvements and it's overwhelming 😓 Let's use this issue to track the string deduplication only (you can see my progress on branch Most of these changes were already in my (secret) backlog and some of them where even implemented in v6.6. Lastly, it's one thing to make a prototype, but it's another to have it fully implemented, tested, and optimized. Anyway, thanks again for sharing these improvements; I'll do my best to integrate them. Best regards, |
Sorry for the flood of information, it was with good intentions... That said, what I published is only one (but major) change: move to a relocatable (and hence more dense) MemoryPool. Will open a separate enhancement request for this and make sure that code only contains what is needed for that request.
Regards, Ewald |
This feature is available in ArduinoJson 6.16.0. |
Preliminary code on v6 and running code on v5 suggests it can save up to 25% of consumed RAM.
I have tried to implement this myself here using two different approaches (see source code) but both seem to have the same erroneous effect: when you serialize the JSON document after deserialization, the tail elements of the JSON string seem to have gone missing (even though the serialized JSON string is semantically correct).
So far I have not figured out why. Although the code only returns a pointer to a previously allocated string in the _string pool (left side of the buffer, if one can be found) it somehow changes the slot/collection structure :-(.
On my modified 5.x version which implements a relocatable slot structure allowing for pre-parsed JSON strings to be stored in flash, it works OK. In that code the strings are stored in Flash, while the structure is copied into RAM.
Thanks in advance.
Ewald
The text was updated successfully, but these errors were encountered: