-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize memory usage of json objects in combination with binary serialization #373
Comments
Some thoughts:
|
Yes, that adds a lot of complexity. I went through some brute force approach to add these extra types: https://gitlab.cern.ch/slac_sandbox/ubjson/blob/ubjson/src/json.hpp |
I'd say JSON is not a format that is well-suited for storing blobs or arrays of numbers large enough so that parsing, writing, and memory utilization issues become important. The API should target usability and common use-cases. |
@mwittgen I had a look at your code. I understand your use case, but the code grows up to an unmaintainable state by copy/pasting nearly the same code over and over again. Maybe some template magic could help, but I fear that this effort only serves a very particular edge case. |
@nlohmann thanks for looking into it. I might continue to look into some template approach. I needed a fast straw man for proving some concepts. Certainly for my use case JSON parsing was was much slower compared to other preferred storage solutions in the physics community like CERN ROOT trees. The rapidJSON parser was as fast ROOT tree parsing. But with the new binary format support parsing should be very fast. |
So storing the exact type of integers (did I miss floating point types) is a separate concern from the compact integer vectors? |
Yes. float32/64 and all the u(int) types. Preserving this information is separate from compact integer/float vectors. Unfortunately, there is no silver bullet when it comes to existing binary formats. UBJSON has optimized array support for all types, but lacks unsigned support, msgpack offer unsigned support, but limited optimized array support. I haven't looked into CBOR yet. WIth ubjson to parse into an optimized vector is trivial, with msgpack the parser would need to do some guess work or make use of user defined types. msgpack only directly supports storage of vector<uint8_t> through its byte array. |
Maybe in binary mode relax the requirement that string must contain a UTF-8 string, and allow arbitrary bytes, so that the user can put any blob in there, e.g. an array of floats or any other POD. Serialization of such data as json should fail, of course. |
FYI: I now merged the MessagePack/CBOR implementation to the develop branch. |
I don't think that supporting all kinds of numeric types would bring broad benefit to the users of the library. For the exchange of large vectors, CBOR/MessagePack should help. |
@nlohmann Using in the long run std::variant and for now similar variant classes implemented for C++11/14 like eggs::variant for json_value would work when implementing more numeric storage types without bloating the code. A lot of the switch(type)/case statements could become obsolete. I still would argue to conserve the type information stored in the binary formats in the json object has some benefits. I have started to play around with eggs::variant and nlohmann::json. |
@mwittgen |
With binary serialization being implemented by msgpack and cbor I wonder if some of the binary format optimizations can be extended to json object memory representation by introducing more specific storage types for floating point numbers and integers by extinguishing between uint_8/16/32/64, int_8/16/32/64, float32 and float64. I am aware this does not reduce the size of a JSON object but would retain the additional type information from the binary formats. For optimizing memory size, what about introducing the concept of ArrayType<uint8_t> and so on and allow the optimized storage for arrays consisting of one specific type? The json object would store a pointer to the optimized array. On a 64-bit system one element in an array of uint8_t uses 16 bytes instead of one byte. A downside is sacrificing the indexing capability j["some_array"][0] would not work if j["some_array"] if it maps to a pointer to an array object.
The text was updated successfully, but these errors were encountered: