-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON parser/generator issues/thoughts #301
Comments
Thanks for the excellent run-down of options. It would be nice if there were a JSON implementation with a decent enough API that we could use without wrappers like "shortjson.h". Transitioning to the native API would probably be a lot safer than transitioning the shared utility functions, since there are still too many places where the JSON type is used interchangeably with json_object. I haven't looked at all the options you describe above but I did spend some time looking at jansson at one point and thought it's API was pretty well thought out, a clear improvement over json-c. |
Actually, a plan of attack could be
|
I would agree with all of your points, and that Jansson's interface is the most sensible I've seen of the pure-C options. This is one place where I find myself wishing for C++ a little bit, and was part of the reason I brought up transitioning the wrappers so that we could cordon off but still use RapidJSON. It ranks as the fastest and lowest memory in 3/4 tests, and is third fastest in the fourth, trailing #1 by 1ms for the full test, where json-c is ~16 times and Jansson 13 times slower and using 7 times and 2.5 times more memory respectively. It's also the only one that offers a 100% compliant implementation, if only it had a maintained C API like czmq for ZMQ. |
Doesn't the JSON type itself also violate RFC7? If we're going to work out a common denominator wrapper lib for json handling, I'd actually like for there to be some kind of error handling instead of calls to If we're really worried about performance, should we also pause to check out other message encoding options like msgpack, bson, etc? (But, yeah, I'd really like to dump json-c). |
It violates a "SHOULD" in RFC 7, but I agree if we fix wrappers we should fix that, plus ENOMEM handling, e.g. add a callback like lsd-tools stuff that can return NULL or abort in an application specific way, accommodating It would seem better to me to choose a JSON implementation for flux-core and stick to it that to create a portable wrapper layer that will by necessity probably be less expressive than the underlying implementation. For example I like jansson's |
Is the issue here implementation expressiveness or speed? I kind of agree with @trws that if we're going through the pain of switching implementations we might as well choose the fastest and most memory efficient out there. Since we would have to make a C wrapper for RapidJSON anyway... |
Yeah, we could definitely get better error handling. Your point about other message encoding options is also well taken. Json has the advantage of being natively supported by just about everything, but performance-wise it lags pretty far beind protocol buffers, bson, Cap'nProto (new project by the guy that wrote protocol buffers for Google), and any number of others. Of course, at least in this list, only json and bson are completely free-form, the others require explicit schemas. Come to think of it, I'm not sure if the schema thing is an upside or a downside just now. It could go either way. The advantage would be explicit protocol-level support for protocol versioning, at least in several cases, but it would have to be more structured on the design side. Jansson certainly has advantages in the interface department when calling it from C. Making an interface like pack/unpack would be a non-trivial amount of work, since the C++ interfaces get around needing it by using function/method overloading on parameter types. The closest would be a wrap around something like the setbypointer interface, or to set up a parser and do casts and calls in a C wrapper, but that would take more distinct calls than pack/unpack do. |
@grondo, by the way, the direct answer to your question is both. The interface of json-c is minimal to the point that some things can't be done with it either easily or efficiently. I'd take a better interface but same performance over a faster library that has a worse interface (a-la ujson4c or yajl), but if we could get better on both sides that would be awesome... |
@trws -- thanks. I was only trying to make the point that if we go through all the work now to convert to a new json library for flux-core then find out later that the json parsing is a performance bottleneck (something we have never tested as of now), we might wish we'd taken the hit on developing an abstraction we like. I know I'm wishing we'd done that from the start, even though it would have entailed more work. Might just be me though so I'll stop harping on it. |
One concern I have (maybe unfounded?) about generated protocols like protobuf is while you can post your Or original "design" was to support both JSON and protocol buffers. Maybe the right approach is to not get too hung up on JSON performance, and go for expressiveness, then migrate stable protocols towards the other method, if performance analysis justfies it? |
The camgunz/cmp C API for msgpack looks interesting to me. It claims to do no internal heap allocations, and looks like it could be trivially imported into the project, consisting of just one .c and one .h file. |
That does look pretty nice, and efficient as well, but it appears to be a wire-format only, rather than something to work with dynamically since it requires a complete re-serialization to change a value. We would probably still be using JSON representations externally then converting to msgpack for the wire, not that that would necessarily be a bad thing. One of the posts on msgpack also mention Avro, which looks pretty interesting. It does take a schema, but a JSON-based dynamic schema, with no required code-generation to implement it. The C API also looks quite robust. |
On transitioning to jansson (as explored in #319), how bad would it really be if we pulled in json-c and jansson into the source tree? They are relatively small and this would solve the dynamic linking problems that I was having internally, plus avoid them for external projects that choose a json library in conflict with the one we're using internally. Eventually we could get rid of json-c. |
While adding json-c and jansson source doesn't seem too bad (they don't seem to have any extra build dependencies, and like you said are not large codebases), I'd be slightly against this in principle because really it doesn't seem like it should be necessary, and might set a bad precedent. I feel like we should try some other tricks to hide internal json library implementation. That being said I never had the chance to go back and look at what you did in #319 and I imagine you tried most of the obvious stuff. If this gets to be a big pain, I'd be fine with pulling in json libs and hopefully finding a better fix later. |
By obvious stuff did you mean linking against .a's provided by the jansson and json-c packages? I did just verify on Ubuntu 14.04 that I'm not sure if there's a decent way to tell automake to link against these statically when building dynamic objects. Come to think of it, there might also be an issue building a PIC dynamic library including third party static libraries? |
The automaker thing basically comes down to using a full path to the .a instead of using a -l. Specifying -static may also do it but isn't as reliable. We could probably get away with linking them dynamically as long as they're linked into separate libraries that don't use overlapping JSON interfaces anyway, but that's harder. How badly different are the interfaces? Is this something we could bang out a quick clang transform tool for and just do it codebase-wide? Sent with Good (www.good.com) From: Jim Garlick By obvious stuff did you mean linking against .a's provided by the jansson and json-c packages? I did just verify on Ubuntu 14.04 that libjson-c-dev and libjansson-dev do provide the static libs... I had thought that maybe the .a was missing in the RHEL package but I'll double check later. I'm not sure if there's a decent way to tell automake to link against these statically when building dynamic objects. Come to think of it, there might also be an issue building a PIC dynamic library including third party static libraries? — |
In #319 I did completely switch libflux-core.so to jansson, but then ran into problems with a kvs test that linked against libflux-core.so and libjson-c.so. It seemed to be picking up one of the jansson functions that had the same name as a json-c function. If this is the state of affairs then it seems problematic to dynamically link libflux-core.so to jansson even if we've completely switched everything over, since that would mean a user application linked against json-c could run into trouble. What am I missing? |
You are not, I neglected the case where a dynamic symbol would leak out to application scope that way. It's preventable, by loading it with local linkage only, but fragile at best. The question circles back to whether the interface is similar enough to make the transformation machinable. It looked like it when I glanced at it before, but you never know until you try I guess. A couple of hours making a tool to do it globally might be worth it just to have it done. Sent with Good (www.good.com) From: Jim Garlick In #319#319 I did completely switch libflux-core.so to jansson, but then ran into problems with a kvs test that linked against libflux-core.so and libjson-c.so. It seemed to be picking up one of the jansson functions that had the same name as a json-c function. If this is the state of affairs then it seems problematic to dynamically link libflux-core.so to jansson even if we've completely switched everything over, since that would mean a user application linked against json-c could run into trouble. What am I missing? — |
I just re-read your comment, apparently I didn't parse it the first time. It could cause problems if there are conflicting json libraries in user code, but no moreso than would be caused by any other transitive dependency. I'm not sure if there is a good way to solve that generally. |
Actually, just so I'm sure, you did all of libflux-core, but what about libutil? The kvs module links that statically, which means it pulls in its link flags from the libtool file, which would include json-c directly. Could that have been the problem? |
Ugh, ignore my rambling this morning. Nervous about other things entirely, the issue is that it should have been getting json-c but was also getting jansson symbols... sorry. |
I've forgotten more than I ever knew about dynamic loader, but it would seem like this issue would be common and the loader would "take care" of hiding symbols in a DSO linked a level below a library in use by an application. i.e. the loaded library should resolve symbols downwards in its link map, but these symbols shouldn't by default be candidates for symbol resolution by default in an application linked to that library. This would seem to be a common enough problem that it is solved, but maybe I'm kidding myself. Also, libtool may be "helping" too much here... Was the kvs test perhaps not linked directly against libjson-c? Or maybe we'd have to be more careful about which libs we add where (right now maybe libjson is put on every single link line whether it is needed or not?) There are other approaches as well that wouldn't require pulling in the library as a whole. We could have an internal library that exports an interface that you like, and dlopens the actual json library underneath. This has the drawback of being way more work than just pulling in the code, and would have the performance impact of and extra function call, but you could later easily swap out implementations. (not saying we should do this, just another idea..) Really, I'd hate to delay building actual functionality for flux by spending a lot of time on issues like these. If conversion to jansson is desirable and the only easy solution is to pull in a couple json libraries then it is fine with me. |
Data point: it seems RHEL doesn't package .a's with jansson-devel and libjson-c-devel. Re-examining PR #319, the failing test It must be picking up the |
OK, I built the test using a standalone makefile, linked against the dynamic versions of libkvs and libflux-core (setting rpath) and the resulting executable no longer fails. ldd confirms I'm using the in-tree dynamic version of libflux-core. (ldd also still shows both libjson-c and libjansson but according to @grondo that's expected) From this I conclude that if we internally linked against the dynamic libjansson, an out of tree user could choose json-c or whatever and not have any problem, as seems intuitive. For transition, I think we could circumvent the in-tree problem by pulling in libjson-c temporarily and building it statically. It could be dropped once we've transitioned off. This is a good place to pause and ask how people feel about libjansson. I like it but before we commit probably we should get an ACK from the team as this is a big change. I think the performance discussion concluded(?) that some other more efficient encoding scheme than JSON could be used when it matters, as opposed to sacrificing expressiveness for efficiency in the JSON library we choose. |
My personal opinion is that jansson is a huge improvement over json-c. However, I haven't really surveyed the other alternatives (and I think @trws had pointed out some single |
I think the only other option proposed above that might make sense would be reimplementing shortjson in terms of yajl, ujson4c, or directly in terms of the C++ UltraJSON. I'm not sure I like these options, as shortjson is not a complete JSON interface, the work needed to provide a data representation is a bit open ended IMHO, and we would need to redesign shortjson for better error handling and RFC 7 conformance. |
I think you're right. I was thinking of an implementation @trws had pointed to as part of clib on the mail list which I hadn't yet had a chance to check out. However, I wasn't able to find anything except js0n, which is a parser (in probably the thinnest version of the term), but not an object interfact at all. (similar to jsmn). I think we could go ahead with libjansson. |
I'm fine with migrating to libjansson. |
Oddly I can't find the json library I was referring to in that email on clib anymore... I can't imagine why it would have been removed, but it's parson. As I see it, parson is the smallest and by a slight margin lightest of the C-only options, libjansson has the nicest ANSI C native interface, and RapidJSON is the fastest, lowest memory usage, most compliant, and has the most comprehensive overall API. This is one of those "there's a but coming" times though, as @garlick said we would have to wrap RapidJSON because it only provides a C++ API, and jansson is definitely the nicest we can do from native C. I guess that's a long way of saying: if we were using C++, it would be a no-brainer for RapidJSON, in C, we should probably use libjansson, IMO. |
As a possible motivator here, and something we may wish to test when moving to the next one, I just got back a profile of broker-0 for a full cap run. These are the functions that take >= 1% of the total cycles found in the samples.
Notably json-c takes approximately 19% of the total cycles spent in broker 0... |
So, something really interesting came to my attention today. Apparently v3 of protocol buffers supports JSON as a first-class data storage format. Meaning, you can give it json to get protobufs, or vice-versa both in the library and on the line. proto3 I'm not sure about anyone else, but that really makes me want to take a look at it as an option... |
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
Problem: json-c exports symbols like json_object_put() that conflict with other JSON implementations we might want to migrate to, such as jansson as discussed in flux-framework#301. To avoid such conflicts with the embedded json-c library, force all visible json-c text symbols to have a json_c_ prefix, by including a new header, namespace.h in the json-c sources and the "public" json.h header. namespace.h redefines text symbols that do not already have a json_c_ prefix, e.g. #define json_object_put json_c_object_put This allows any part of flux-core to independently migrate to a new JSON implementation that has clashing symbols by simply not including src/common/libjson-c/json.h.
We've got jansson support now, so closing this. |
As more and more of flux relies, implicitly or explicitly, on the handling of JSON strings, I thought it might be good to put up a discussion issue for the state of things. Currently we're using
json-c
, which does the job well for the most part, but has some unfortunate shortcomings where efficiency is concerned in some of our use-cases.Notably, there is no way to get a size in number of elements for a json object other than to iterate over the object in its entirety and collect a count. It also ranks as one of the slower and higher memory use json implementations in the nativejson-benchmark.
It's certainly not the top of the priority list, but some thoughts on alternatives might be worthwhile.
Of those with native C interfaces, Jansson seems to have the most features. Yajl is pretty fast and actually installed on LC systems, but comes as a callback mechanism with no pre-defined data representation, definitely not batteries included, and ultrajson (oddly a C module for python, but by a wide margin the fastest pure C implementation) is the same way in the guise of ujson4c.
In a perfect world, RapidJson would be the obvious choice being almost always the fastest, lowest memory, most feature-packed and battle-tested, but it has no native C wrapper. Even so, using it to implement the jsonutil and shortjson primitives might be worth it, and the JSON Pointers interface could be wrapped rather cleanly in C (none of the C-based interfaces seem to provide this yet).
The text was updated successfully, but these errors were encountered: