Introduce new JSON parser utility #12531

wenduwan · 2024-05-08T14:57:44Z

This patch introduces a utility in OPAL based on the 3rd-party project https://github.com/json-parser/json-parser.git

The utility provides APIs to read JSON into memory along with getters to retrieve C values.

Please see the unit test opal_json.c for example usage.

lrbison · 2024-05-08T15:22:55Z

I understand the desire to insulate ourselves from the specific json reader implementation by abstracting out an interface for us to use. However, I am a little uncomfortable inventing a new interface for a json reader. Relevant XKCD. Why not use json-parser's json.c/json.h symbols directly?

wenduwan · 2024-05-08T15:30:49Z

@lrbison For the most part, json-parser only provides an API to parse a string. This PR extended that API to support filename.

Another reason is that I modeled the JSON object with a opal_object_t subclass. The data structure is not compatible with json-parser's json_value, so we need a layer to hide the latter.

juntangc · 2024-05-08T19:13:19Z

It's probably better to add some descriptions on how to use the json parser than pointing the users to some source code. Please also include the new features added and how to convert existing tuning files into the json files.

wenduwan · 2024-05-08T19:56:29Z

That's a good idea. I can add an example later.

bosilca

Looks good, but there are still some questions pending.

Can you don't use the same name for the two .[ch] files. I know they are in different directories, but the identical name is confusing.
What is the benefit to have the OPAL_OBJECT support ? Is this benefit worth enough to account for the extra memory and CPU needed (including the refcount) to handle the opal_object_t.

test/util/Makefile.am

test/util/opal_json.c

.gitignore

wenduwan · 2024-05-09T21:31:59Z

@bosilca Thank you very much for the review. I have renamed the files as suggested.

What is the benefit to have the OPAL_OBJECT support ? Is this benefit worth enough to account for the extra memory and CPU needed (including the refcount) to handle the opal_object_t.

The primary motivation is memory management. In json-parser, only the root object should be freed after use - the child objects are merely pointers to the root. This can be tricky to use. With the opal object lifecycle hooks we can override the destructor to correctly free the root object only, while the application can safely OBJ_RELEASE the objects in any order. I understand that this can also be achieved with an explicit destructor function opal_json_free but I like the OPAL_OBJ pattern.

TBH I am not concerned with resource usage - the parser should only be used to read the file into memory and stored somewhere else, e.g. a schema with efficient query methods. Afterwards the parser should be torn down.

wenduwan · 2024-05-09T21:32:39Z

@juntangc I added an example program in opal_json.h. Please take a look.

.gitignore

config/opal_config_files.m4

opal/util/json/opal_json.h

bosilca · 2024-05-10T08:47:59Z

If you don't add all the OPAL objectification you only have to free a single object, the initial json object via json_value_free. Simpler and cleaner, no need for extra reference counts, nor extra memory allocations. JSON manipulation is simple and straightforward, we should try to keep it that way.

wenduwan · 2024-05-10T16:12:40Z

If you don't add all the OPAL objectification you only have to free a single object, the initial json object via json_value_free. Simpler and cleaner, no need for extra reference counts, nor extra memory allocations. JSON manipulation is simple and straightforward, we should try to keep it that way.

That is also my goal. But my opinion differs in that:

I would like to be more defensive against double free behaviors. Instead of trusting the user to correctly remember the root object and NOT free its children by accident, we can provide a better user experience by hiding this detail and handle it internally.
IMO the OPAL_OBJECT provides a consistent user interface and uniform coding style. I imagine this is also lowers learning curve for the user.

bosilca · 2024-05-10T17:55:08Z

To state this differently instead of trusting to user to release a single object once (aka. the main json object she obtained from the load function), you trust them to release each object resulting from opal_json_get_key, opal_json_get_index in addition to the high level object obtained via opal_json_load*. Sorry, but I'm hardly convinced.

bosilca · 2024-05-10T17:58:31Z

I think we can solve our disagreement if instead of opal_json_t** the json API takes an opal_json_t*object. This way the user will always be in charge of the life expectancy of the objects, and the code will now really look as in the example (the one where the upper level object is overwritten by the lower json reader).

wenduwan · 2024-05-10T21:50:47Z

@bosilca Thinking more about OPAL_OBJECT I have a question about inheritance:

In this PR I want to hide json_value from a 3rd-party library and only expose

struct opal_json_t {
    opal_object_t super;
    opal_json_type type;
};

Currently I take advantage of the subclass

struct opal_json_internal_t {
    struct opal_json_t parent;
    json_value *value;
};

This makes the intention clear that the user should not touch the internals beyond struct opal_json_t.

I wonder what the alternative is if I don't use OPAL_OBJECT. It is obvious to me that we can still do something similar without subclassing, e.g.

struct opal_json_t {
    opal_json_type type;
};
...
struct opal_json_internal_t {
    struct opal_json_t parent;
    json_value *value;
};

Would this be better in your opinion?

wenduwan · 2024-05-17T16:39:20Z

@lrbison @bosilca I added a commit that show the difference if I don't use OPAL_OBJECT. Please take a look and let me know if you like it better.

bosilca

looks much better to me.

opal/util/json/opal_json.c

lrbison

I left a comment about the tempfile, but otherwise approve. I prefer this opal util version, thank you!

jsquyres

The code generally looks good.

I notice that we're introducing some public json_* symbols into libopen-pal.so:

(venv) root@a21a97d3936c:~# nm /tmp/bogus/lib/libopen-pal.so| grep ' json_'
000000000004ec40 T json_parse
000000000004d810 T json_parse_ex
000000000004ecb0 T json_value_free
000000000004d754 T json_value_free_ex
00000000000ca9c8 R json_value_none

I know that these are the third-party symbols. We should probably prefix these so that there's no conflict with some other application linking in their own copy of a JSON library.

lrbison · 2024-07-01T15:27:47Z

@jsquyres are you suggesting some link-time magic or simply renaming those symbols in the source?

jsquyres · 2024-07-02T11:56:35Z

@jsquyres are you suggesting some link-time magic or simply renaming those symbols in the source?

Whatever is least disruptive. I just think we shouldn't have public symbols named json_* in libopen-pal.

wenduwan · 2024-07-02T14:59:01Z

@jsquyres Very good point. I marked those symbols__attribute__((visibility ("hidden"))) in the update. Based on my research this is the most convenient approach in this use case. But I'm open to alternative solutions.

This patch introduces a utility in OPAL based on the 3rd-party project https://github.com/json-parser/json-parser.git The utility provides APIs to read JSON into memory along with getters to retrieve C values. Signed-off-by: Wenduo Wang <[email protected]>

wenduwan · 2024-07-09T14:04:54Z

@lrbison and I has a discussion offline. He raised questions about the potential side effect of __attribute__:

Would the application be able to override the json_* functions by implementing its own version in the main?
Would ompi still be able to call opal_json_* which eventually invokes json_* now that the symbol is not exported?

I wrote a simple app to confirm that, the internal json_* symbols are not shadowed by those in the app, and ompi can use opal_json_* functions without issues.

rhc54 · 2024-07-09T14:17:59Z

Just to help me clarify my understanding since we did something similar in PMIx. If symbols are in libopal and not visible, then they cannot be accessed from outside of libopal. However, if the opal_json symbols are inside libopal and are visible, then you can access those from outside libopal, and those functions can in turn access the hidden json symbols inside of libopal (since the functions are colocated in libopal).

This is what we did and it works fine, although we don't "publish" the equivalent opal_json functions as they are not something the app should be calling.

Only issue was - what happens when the user configures with visibility disabled? In that case (IIRC), all symbols become visible and you can/do get json confusion if they separately use those symbols. At least, that's what we saw (IIRC) - which is why we prefixed the symbol names just to be safe. Eventually, we dropped the support as it wasn't being used (though we may need to revive it at some point).

wenduwan · 2024-07-10T00:33:01Z

@rhc54 Thanks for the reminder. I experimented with --disable-visibility flag, and figured that on Linux with gcc11 this flags had no effect on the symbol visibility in the library - I only see opal_json_* symbols with and without it.

On Mac with clang 15 I observe that both _opa_json_* and _json_* symbols are visible, regardless of --disable-visibility flag. So overall I think with a modern C compiler, the --disable-visibility flag shouldn't have any effect.

Did I miss anything?

rhc54 · 2024-07-10T00:58:43Z

I'm afraid that isn't how it works - I suspect the OMPI configure flag is no longer correct. See this article (https://www.akkadia.org/drepper/dsohowto.pdf) starting at section 2.2.2 (on page 18) for an explanation. Put simply - visibility is alive and well and definitely has an effect on symbols.

rhc54 · 2024-07-10T12:18:13Z

Just to be clear: I'm not saying you have a problem. Only suggesting you consider the case where the user requests visibility to be disabled. For example, the paper will explain why that is necessary when debugging, so it isn't an unreasonable use-case. I'm not sure how your proposed solution to the symbol pollution issue will impact it.

jsquyres · 2024-07-10T18:05:11Z

I just checked the code, and here's a fun fact: config/opal_check_visibility.m4 does the necessary checking to see if -fvisibility=hidden works, but then doesn't actually add it to Open MPI's build CFLAGS.

Meaning: Open MPI uses the same compiler flags regardless of whether you use --enable-visibilty, --disable-visibility, or neither. 🤦‍♂️

That being said, it looks like at least some environments enable hidden visibility by default these days. I did quick test builds in Fedora 38 and MacOS 14/Sonoma, and I see that OMPI's build is not passing -fvisibility=hidden on the command line, but plenty of symbols are still ending up as local (and intended symbols are ending up as public).

Doing a little spelunking, I think we offered --enable-visibility way back in yonder years before symbol visibility was either available universally, or at least was not the default universally.

If symbol visibility is (generally?) the default these days, I don't know if we need to do anything further. Particularly since OMPI has a non-functional --disable-visibility CLI option, OMPI provides no built-in way to "turn off" the visibility functionality. If the user does pass in the compiler flag that is the opposite of -fvisibilty=hidden, then lots of symbols will become visible, and I don't know if we have properly prefixed all of them. I think the user will get what they deserve if they do that... 😄

Related question (but does not need to be part of this PR): should we remove the --enable-visibilty functionality? I think there might need to be a little surgery here, because we do want DECLSPEC stuff to keep working, but we don't need to do any of the testing for the compiler CLI options.

rhc54 · 2024-07-10T18:15:29Z

If the user does pass in the compiler flag that is the opposite of -fvisibilty=hidden, then lots of symbols will become visible, and I don't know if we have properly prefixed all of them. I think the user will get what they deserve if they do that...

As I said above, the paper explains that you need to make symbols visible for the debugger to work. So it isn't a case of the user getting some deserved pain - they might get the pain simply because they are attempting to debug their code and want to trace MPI calls (and hence require that the OMPI library also have visible symbols).

(In case anyone is wondering, the paper is written by the person who heads up DSO design/spec for gcc - so he is considered the expert on visibility)

I don't claim to fully grok all the implications here - e.g., if I'm running a debugger on my code and want to trace down into MPI, does the OMPI library also have to expose their symbols? I believe the answer is "yes", but haven't dug enough to prove it.

Just pointing out that there are some subtle things going on here that might merit further thought.

bosilca · 2024-07-10T18:51:48Z

This discussion is not grounded on current OMPI code. The --disable-visibility configure option, does little to the building environment except preventing our *_DECLSPEC from being useful (it is defined to nothing). Thus, this configure option is mostly useless because the default visibility is default anyway, so unless the user wants to build OMPI with the visibility hidden (by adding the CFLAGS=-fvisibility=hidden) to the configure line. Unfortunately, using this flag is not possible in today's code anymore, because most of the submodules we are pulling in do not have proper visibility of their symbols. (For the sake of the rest of this comment, I used some tricks to generate the libopal shared library).

We can assess the visibility from nm output, if the attribute is upper case the symbol is externally available and if lower case it exists but it is local (scope limited to the existing shared library). Let me add a second, more clear, test: "Can I link against the OPAL library and use any json_* symbol?". Please notice this test uses two symbols, one opal_json_free provided by libopal and one, json_parse, provided by the json library we included.

#include <stdlib.h>

extern int json_parse();
extern int opal_json_free();

int main(int argc, char* argv[])
{
  json_parse(NULL);
  opal_json_free(NULL);
  return 0;
}

OSX (clang 15.0.0)	CFLAGS=-fvisibility=hidden	--enable-visibility	--disable-visibility
visibility as discovered by configure	#define OPAL_C_HAVE_VISIBILITY 1	#define OPAL_C_HAVE_VISIBILITY 1	#define OPAL_C_HAVE_VISIBILITY 0 #define OPAL_HAVE_ATTRIBUTE_VISIBILITY 1
is the symbol visible with nm	t _json_parse T _opal_json_free	t _json_parse T _opal_json_free	t _json_parse T _opal_json_free
Can I link against the tester	No (missing `_json_parse` symbol)	No (missing `_json_parse` symbol)	No (missing `_json_parse` symbol)

Linux (gcc 11.4.1)	CFLAGS=-fvisibility=hidden	--enable-visibility	--disable-visibility
visibility as discovered by configure	#define OPAL_C_HAVE_VISIBILITY 1	#define OPAL_C_HAVE_VISIBILITY 1	#define OPAL_C_HAVE_VISIBILITY 0
is the symbol visible with nm	t json_parse T opal_json_free	t json_parse T opal_json_free	t json_parse T opal_json_free
Can I link against the tester	No (missing `json_parse` symbol)	No (missing `json_parse` symbol)	No (missing `json_parse` symbol)

I think the conclusion is that this code is good to go as it despite all the ongoing discussions.

bosilca · 2024-07-10T19:00:22Z

So we are all looking into this. So much wasted time !

I recall we were running a checker (at the end of debug builds or something) to make sure that all exported symbols are correctly prefixed ?

bosilca · 2024-07-10T19:05:15Z

The reason we forcefully make some symbols externally visible has nothing to do with debugging. Without these symbols being visible our MCA components would not be able to access them when loaded at runtime.

jsquyres · 2024-07-10T19:52:08Z

FWIW, even with the current visibility settings (with json_* as local and opal_json_* as global), I'm able to debug into the json_* functions:

(lldb) s
Process 68249 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x000000010062dd10 libopen-pal.0.dylib`json_parse(json="{\"foo\":3}", length=9) at json.c:1001:18
   998 	__attribute__((visibility ("hidden")))
   999 	json_value * json_parse (const json_char * json, size_t length)
   1000	{
-> 1001	   json_settings settings = { 0 };
   1002	   return json_parse_ex (&settings, json, length, 0);
   1003	}
   1004	
Target 0: (hello_c) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
  * frame #0: 0x000000010062dd10 libopen-pal.0.dylib`json_parse(json="{\"foo\":3}", length=9) at json.c:1001:18
    frame #1: 0x0000000100632878 libopen-pal.0.dylib`opal_json_load(str="{\"foo\":3}", len=9, json=0x000000016fdf9530) at opal_json.c:98:25
    frame #2: 0x00000001005e36e0 libopen-pal.0.dylib`opal_init(pargc=0x000000016fdfe0f8, pargv=0x000000016fdfe0f0) at opal_init.c:158:9
    frame #3: 0x0000000100c36be0 libmpi.0.dylib`ompi_rte_init(pargc=0x000000016fdfe0f8, pargv=0x000000016fdfe0f0) at ompi_rte.c:570:32
    frame #4: 0x0000000100c40050 libmpi.0.dylib`ompi_mpi_instance_init_common(argc=1, argv=0x000000016fdff148) at instance.c:401:32
    frame #5: 0x0000000100c3fac8 libmpi.0.dylib`ompi_mpi_instance_init(ts_level=0, info=0x00000001011371b8, errhandler=0x0000000101135920, instance=0x000000010113fb18, argc=1, argv=0x000000016fdff148) at instance.c:824:15
    frame #6: 0x0000000100c32124 libmpi.0.dylib`ompi_mpi_init(argc=1, argv=0x000000016fdff148, requested=0, provided=0x000000016fdfed50, reinit_ok=false) at ompi_mpi_init.c:359:11
    frame #7: 0x0000000100d918e0 libmpi.0.dylib`MPI_Init(argc=0x000000016fdfedd0, argv=0x000000016fdfedc8) at init.c:67:15
    frame #8: 0x0000000100003e84 hello_c`main(argc=1, argv=0x000000016fdff148) at hello_c.c:18:5
    frame #9: 0x000000018a55e0e0 dyld`start + 2360

So I think the current visibility stuff is giving us what we want.

The question as to whether to remove the compiler-specific visibility stuff is still relevant, but doesn't have to be part of this PR.

wenduwan · 2024-07-10T20:01:25Z

I recall we were running a checker (at the end of debug builds or something) to make sure that all exported symbols are correctly prefixed ?

Do we used to have this? It will be very helpful.

bosilca · 2024-07-10T20:16:51Z

@jsquyres helped me find it. It's in config/find_common_symbol and it is called at the end of Makefile.am but it does not do everything I said. But you can use something like this to get a list (remove the _ in front of the symbols on Linux) nm ompi/.libs/libmpi.dylib | grep ' [TWBCDVR] ' | grep -v -e ' _MPI_' -e ' _mpi_' -e ' _PMPI_' -e ' _MPIX_' -e ' _MPL_' -e ' _ADIOI_' -e ' _ADIO_' -e ' _ompi_' -e ' _OMPI_' -e ' _mca_'

jsquyres · 2024-07-10T20:17:44Z

Do we used to have this? It will be very helpful.

@bosilca and I were talking about this in Slack today. This is also a separate issue than this PR, but we both agree that it would be a great CI test. Could be something as simple as:

nm ompi/.libs/libmpi.dylib | grep ' [TWBCDVR] ' | grep -v -e ' _MPI_' -e ' _mpi_' -e ' _PMPI_' -e ' _MPIX_' -e ' _MPL_' -e ' _ADIOI_' -e ' _ADIO_' -e ' _ompi_' -e ' _OMPI_' -e ' _mca_'

That was on OSX -- could be easily adapted for Linux. It could also easily be done in Python to be a bit more fine-grained for checking, etc.

The https://github.com/open-mpi/ompi/blob/main/config/find_common_syms script (which is called by the top-level Makefile.am) could be a good model for this.

lrbison

My comments have been addressed. LGTM.

wenduwan · 2024-07-10T20:19:18Z

@rhc54 I'm merging the change as-is. Will also explore the symbol validation suggested by George.

rhc54 · 2024-07-10T20:38:43Z

Sounds reasonable to me! FWIW: @jsquyres and I wrote a script a few years back (i.e., 10) to automatically prefix symbols for just these purposes. See https://github.com/open-mpi/ompi/blob/main/contrib/symbol-hiding.pl

github-actions bot added the Target: main label May 8, 2024

wenduwan force-pushed the opal_util_json branch 2 times, most recently from cec8408 to e494d37 Compare May 8, 2024 15:01

wenduwan self-assigned this May 8, 2024

bosilca reviewed May 9, 2024

View reviewed changes

test/util/Makefile.am Show resolved Hide resolved

test/util/opal_json.c Outdated Show resolved Hide resolved

.gitignore Outdated Show resolved Hide resolved

wenduwan force-pushed the opal_util_json branch from e494d37 to 38633a2 Compare May 9, 2024 21:14

wenduwan force-pushed the opal_util_json branch from 38633a2 to 915b893 Compare May 9, 2024 21:34

wenduwan requested a review from bosilca May 9, 2024 21:38

bosilca reviewed May 10, 2024

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

config/opal_config_files.m4 Show resolved Hide resolved

opal/util/json/opal_json.h Outdated Show resolved Hide resolved

wenduwan force-pushed the opal_util_json branch 2 times, most recently from ab22d6e to 6363bf1 Compare May 10, 2024 17:45

wenduwan force-pushed the opal_util_json branch from 6363bf1 to eafc65b Compare May 10, 2024 18:04

wenduwan force-pushed the opal_util_json branch from 61cc14b to 3b7b01e Compare May 17, 2024 16:38

wenduwan requested a review from bosilca May 17, 2024 16:39

bosilca reviewed May 17, 2024

View reviewed changes

opal/util/json/opal_json.c Outdated Show resolved Hide resolved

opal/util/json/opal_json.c Outdated Show resolved Hide resolved

wenduwan force-pushed the opal_util_json branch 2 times, most recently from 47d5261 to 00503cb Compare May 17, 2024 22:19

bosilca approved these changes May 17, 2024

View reviewed changes

lrbison requested changes Jul 1, 2024

View reviewed changes

jsquyres reviewed Jul 1, 2024

View reviewed changes

wenduwan force-pushed the opal_util_json branch from 42e7205 to 8ed02ce Compare July 2, 2024 14:56

wenduwan force-pushed the opal_util_json branch from 8ed02ce to de82c6f Compare July 2, 2024 15:02

wenduwan force-pushed the opal_util_json branch from de82c6f to 364d7d0 Compare July 2, 2024 15:08

wenduwan requested a review from lrbison July 10, 2024 20:00

lrbison approved these changes Jul 10, 2024

View reviewed changes

jsquyres approved these changes Jul 10, 2024

View reviewed changes

wenduwan merged commit 76458b0 into open-mpi:main Jul 10, 2024
13 checks passed

wenduwan deleted the opal_util_json branch July 10, 2024 20:37

Introduce new JSON parser utility #12531

Introduce new JSON parser utility #12531

Conversation

wenduwan commented May 8, 2024

lrbison commented May 8, 2024

wenduwan commented May 8, 2024

juntangc commented May 8, 2024

wenduwan commented May 8, 2024

bosilca left a comment

Choose a reason for hiding this comment

wenduwan commented May 9, 2024

wenduwan commented May 9, 2024

bosilca commented May 10, 2024

wenduwan commented May 10, 2024

bosilca commented May 10, 2024

bosilca commented May 10, 2024

wenduwan commented May 10, 2024

wenduwan commented May 17, 2024

bosilca left a comment

Choose a reason for hiding this comment

lrbison left a comment

Choose a reason for hiding this comment

jsquyres left a comment

Choose a reason for hiding this comment

lrbison commented Jul 1, 2024

jsquyres commented Jul 2, 2024

wenduwan commented Jul 2, 2024 • edited Loading

wenduwan commented Jul 9, 2024

rhc54 commented Jul 9, 2024

wenduwan commented Jul 10, 2024

rhc54 commented Jul 10, 2024

rhc54 commented Jul 10, 2024

jsquyres commented Jul 10, 2024

rhc54 commented Jul 10, 2024

bosilca commented Jul 10, 2024 • edited Loading

bosilca commented Jul 10, 2024

bosilca commented Jul 10, 2024

jsquyres commented Jul 10, 2024

wenduwan commented Jul 10, 2024

bosilca commented Jul 10, 2024

jsquyres commented Jul 10, 2024

lrbison left a comment

Choose a reason for hiding this comment

wenduwan commented Jul 10, 2024

rhc54 commented Jul 10, 2024

wenduwan commented Jul 2, 2024 •

edited

Loading

bosilca commented Jul 10, 2024 •

edited

Loading