Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new JSON parser utility #12531

Merged
merged 1 commit into from
Jul 10, 2024
Merged

Conversation

wenduwan
Copy link
Contributor

@wenduwan wenduwan commented May 8, 2024

This patch introduces a utility in OPAL based on the 3rd-party project https://github.com/json-parser/json-parser.git

The utility provides APIs to read JSON into memory along with getters to retrieve C values.

Please see the unit test opal_json.c for example usage.

@wenduwan wenduwan force-pushed the opal_util_json branch 2 times, most recently from cec8408 to e494d37 Compare May 8, 2024 15:01
@lrbison
Copy link
Contributor

lrbison commented May 8, 2024

I understand the desire to insulate ourselves from the specific json reader implementation by abstracting out an interface for us to use. However, I am a little uncomfortable inventing a new interface for a json reader. Relevant XKCD. Why not use json-parser's json.c/json.h symbols directly?

@wenduwan
Copy link
Contributor Author

wenduwan commented May 8, 2024

@lrbison For the most part, json-parser only provides an API to parse a string. This PR extended that API to support filename.

Another reason is that I modeled the JSON object with a opal_object_t subclass. The data structure is not compatible with json-parser's json_value, so we need a layer to hide the latter.

@wenduwan wenduwan self-assigned this May 8, 2024
@juntangc
Copy link
Contributor

juntangc commented May 8, 2024

It's probably better to add some descriptions on how to use the json parser than pointing the users to some source code. Please also include the new features added and how to convert existing tuning files into the json files.

@wenduwan
Copy link
Contributor Author

wenduwan commented May 8, 2024

That's a good idea. I can add an example later.

Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but there are still some questions pending.

  1. Can you don't use the same name for the two .[ch] files. I know they are in different directories, but the identical name is confusing.
  2. What is the benefit to have the OPAL_OBJECT support ? Is this benefit worth enough to account for the extra memory and CPU needed (including the refcount) to handle the opal_object_t.

test/util/Makefile.am Show resolved Hide resolved
test/util/opal_json.c Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
@wenduwan
Copy link
Contributor Author

wenduwan commented May 9, 2024

@bosilca Thank you very much for the review. I have renamed the files as suggested.

What is the benefit to have the OPAL_OBJECT support ? Is this benefit worth enough to account for the extra memory and CPU needed (including the refcount) to handle the opal_object_t.

The primary motivation is memory management. In json-parser, only the root object should be freed after use - the child objects are merely pointers to the root. This can be tricky to use. With the opal object lifecycle hooks we can override the destructor to correctly free the root object only, while the application can safely OBJ_RELEASE the objects in any order. I understand that this can also be achieved with an explicit destructor function opal_json_free but I like the OPAL_OBJ pattern.

TBH I am not concerned with resource usage - the parser should only be used to read the file into memory and stored somewhere else, e.g. a schema with efficient query methods. Afterwards the parser should be torn down.

@wenduwan
Copy link
Contributor Author

wenduwan commented May 9, 2024

@juntangc I added an example program in opal_json.h. Please take a look.

.gitignore Outdated Show resolved Hide resolved
config/opal_config_files.m4 Show resolved Hide resolved
opal/util/json/opal_json.h Outdated Show resolved Hide resolved
@bosilca
Copy link
Member

bosilca commented May 10, 2024

If you don't add all the OPAL objectification you only have to free a single object, the initial json object via json_value_free. Simpler and cleaner, no need for extra reference counts, nor extra memory allocations. JSON manipulation is simple and straightforward, we should try to keep it that way.

@wenduwan
Copy link
Contributor Author

If you don't add all the OPAL objectification you only have to free a single object, the initial json object via json_value_free. Simpler and cleaner, no need for extra reference counts, nor extra memory allocations. JSON manipulation is simple and straightforward, we should try to keep it that way.

That is also my goal. But my opinion differs in that:

  1. I would like to be more defensive against double free behaviors. Instead of trusting the user to correctly remember the root object and NOT free its children by accident, we can provide a better user experience by hiding this detail and handle it internally.
  2. IMO the OPAL_OBJECT provides a consistent user interface and uniform coding style. I imagine this is also lowers learning curve for the user.

@wenduwan wenduwan force-pushed the opal_util_json branch 2 times, most recently from ab22d6e to 6363bf1 Compare May 10, 2024 17:45
@bosilca
Copy link
Member

bosilca commented May 10, 2024

To state this differently instead of trusting to user to release a single object once (aka. the main json object she obtained from the load function), you trust them to release each object resulting from opal_json_get_key, opal_json_get_index in addition to the high level object obtained via opal_json_load*. Sorry, but I'm hardly convinced.

@bosilca
Copy link
Member

bosilca commented May 10, 2024

I think we can solve our disagreement if instead of opal_json_t** the json API takes an opal_json_t*object. This way the user will always be in charge of the life expectancy of the objects, and the code will now really look as in the example (the one where the upper level object is overwritten by the lower json reader).

@wenduwan
Copy link
Contributor Author

@bosilca Thinking more about OPAL_OBJECT I have a question about inheritance:

In this PR I want to hide json_value from a 3rd-party library and only expose

struct opal_json_t {
    opal_object_t super;
    opal_json_type type;
};

Currently I take advantage of the subclass

struct opal_json_internal_t {
    struct opal_json_t parent;
    json_value *value;
};

This makes the intention clear that the user should not touch the internals beyond struct opal_json_t.

I wonder what the alternative is if I don't use OPAL_OBJECT. It is obvious to me that we can still do something similar without subclassing, e.g.

struct opal_json_t {
    opal_json_type type;
};
...
struct opal_json_internal_t {
    struct opal_json_t parent;
    json_value *value;
};

Would this be better in your opinion?

@wenduwan
Copy link
Contributor Author

@lrbison @bosilca I added a commit that show the difference if I don't use OPAL_OBJECT. Please take a look and let me know if you like it better.

@wenduwan wenduwan requested a review from bosilca May 17, 2024 16:39
Copy link
Member

@bosilca bosilca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much better to me.

opal/util/json/opal_json.c Outdated Show resolved Hide resolved
opal/util/json/opal_json.c Outdated Show resolved Hide resolved
@wenduwan wenduwan force-pushed the opal_util_json branch 2 times, most recently from 47d5261 to 00503cb Compare May 17, 2024 22:19
Copy link
Contributor

@lrbison lrbison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the tempfile, but otherwise approve. I prefer this opal util version, thank you!

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code generally looks good.

I notice that we're introducing some public json_* symbols into libopen-pal.so:

(venv) root@a21a97d3936c:~# nm /tmp/bogus/lib/libopen-pal.so| grep ' json_'
000000000004ec40 T json_parse
000000000004d810 T json_parse_ex
000000000004ecb0 T json_value_free
000000000004d754 T json_value_free_ex
00000000000ca9c8 R json_value_none

I know that these are the third-party symbols. We should probably prefix these so that there's no conflict with some other application linking in their own copy of a JSON library.

@lrbison
Copy link
Contributor

lrbison commented Jul 1, 2024

@jsquyres are you suggesting some link-time magic or simply renaming those symbols in the source?

@jsquyres
Copy link
Member

jsquyres commented Jul 2, 2024

@jsquyres are you suggesting some link-time magic or simply renaming those symbols in the source?

Whatever is least disruptive. I just think we shouldn't have public symbols named json_* in libopen-pal.

@wenduwan
Copy link
Contributor Author

wenduwan commented Jul 2, 2024

@jsquyres Very good point. I marked those symbols__attribute__((visibility ("hidden"))) in the update. Based on my research this is the most convenient approach in this use case. But I'm open to alternative solutions.

This patch introduces a utility in OPAL based on the 3rd-party project
https://github.com/json-parser/json-parser.git

The utility provides APIs to read JSON into memory along with getters to
retrieve C values.

Signed-off-by: Wenduo Wang <[email protected]>
@wenduwan
Copy link
Contributor Author

wenduwan commented Jul 9, 2024

@lrbison and I has a discussion offline. He raised questions about the potential side effect of __attribute__:

  • Would the application be able to override the json_* functions by implementing its own version in the main?
  • Would ompi still be able to call opal_json_* which eventually invokes json_* now that the symbol is not exported?

I wrote a simple app to confirm that, the internal json_* symbols are not shadowed by those in the app, and ompi can use opal_json_* functions without issues.

@rhc54
Copy link
Contributor

rhc54 commented Jul 9, 2024

Just to help me clarify my understanding since we did something similar in PMIx. If symbols are in libopal and not visible, then they cannot be accessed from outside of libopal. However, if the opal_json symbols are inside libopal and are visible, then you can access those from outside libopal, and those functions can in turn access the hidden json symbols inside of libopal (since the functions are colocated in libopal).

This is what we did and it works fine, although we don't "publish" the equivalent opal_json functions as they are not something the app should be calling.

Only issue was - what happens when the user configures with visibility disabled? In that case (IIRC), all symbols become visible and you can/do get json confusion if they separately use those symbols. At least, that's what we saw (IIRC) - which is why we prefixed the symbol names just to be safe. Eventually, we dropped the support as it wasn't being used (though we may need to revive it at some point).

@wenduwan
Copy link
Contributor Author

@rhc54 Thanks for the reminder. I experimented with --disable-visibility flag, and figured that on Linux with gcc11 this flags had no effect on the symbol visibility in the library - I only see opal_json_* symbols with and without it.

On Mac with clang 15 I observe that both _opa_json_* and _json_* symbols are visible, regardless of --disable-visibility flag. So overall I think with a modern C compiler, the --disable-visibility flag shouldn't have any effect.

Did I miss anything?

@rhc54
Copy link
Contributor

rhc54 commented Jul 10, 2024

I'm afraid that isn't how it works - I suspect the OMPI configure flag is no longer correct. See this article (https://www.akkadia.org/drepper/dsohowto.pdf) starting at section 2.2.2 (on page 18) for an explanation. Put simply - visibility is alive and well and definitely has an effect on symbols.

@rhc54
Copy link
Contributor

rhc54 commented Jul 10, 2024

Just to be clear: I'm not saying you have a problem. Only suggesting you consider the case where the user requests visibility to be disabled. For example, the paper will explain why that is necessary when debugging, so it isn't an unreasonable use-case. I'm not sure how your proposed solution to the symbol pollution issue will impact it.

@jsquyres
Copy link
Member

I just checked the code, and here's a fun fact: config/opal_check_visibility.m4 does the necessary checking to see if -fvisibility=hidden works, but then doesn't actually add it to Open MPI's build CFLAGS.

Meaning: Open MPI uses the same compiler flags regardless of whether you use --enable-visibilty, --disable-visibility, or neither. 🤦‍♂️

That being said, it looks like at least some environments enable hidden visibility by default these days. I did quick test builds in Fedora 38 and MacOS 14/Sonoma, and I see that OMPI's build is not passing -fvisibility=hidden on the command line, but plenty of symbols are still ending up as local (and intended symbols are ending up as public).

Doing a little spelunking, I think we offered --enable-visibility way back in yonder years before symbol visibility was either available universally, or at least was not the default universally.

If symbol visibility is (generally?) the default these days, I don't know if we need to do anything further. Particularly since OMPI has a non-functional --disable-visibility CLI option, OMPI provides no built-in way to "turn off" the visibility functionality. If the user does pass in the compiler flag that is the opposite of -fvisibilty=hidden, then lots of symbols will become visible, and I don't know if we have properly prefixed all of them. I think the user will get what they deserve if they do that... 😄

Related question (but does not need to be part of this PR): should we remove the --enable-visibilty functionality? I think there might need to be a little surgery here, because we do want DECLSPEC stuff to keep working, but we don't need to do any of the testing for the compiler CLI options.

@rhc54
Copy link
Contributor

rhc54 commented Jul 10, 2024

If the user does pass in the compiler flag that is the opposite of -fvisibilty=hidden, then lots of symbols will become visible, and I don't know if we have properly prefixed all of them. I think the user will get what they deserve if they do that...

As I said above, the paper explains that you need to make symbols visible for the debugger to work. So it isn't a case of the user getting some deserved pain - they might get the pain simply because they are attempting to debug their code and want to trace MPI calls (and hence require that the OMPI library also have visible symbols).

(In case anyone is wondering, the paper is written by the person who heads up DSO design/spec for gcc - so he is considered the expert on visibility)

I don't claim to fully grok all the implications here - e.g., if I'm running a debugger on my code and want to trace down into MPI, does the OMPI library also have to expose their symbols? I believe the answer is "yes", but haven't dug enough to prove it.

Just pointing out that there are some subtle things going on here that might merit further thought.

@bosilca
Copy link
Member

bosilca commented Jul 10, 2024

This discussion is not grounded on current OMPI code. The --disable-visibility configure option, does little to the building environment except preventing our *_DECLSPEC from being useful (it is defined to nothing). Thus, this configure option is mostly useless because the default visibility is default anyway, so unless the user wants to build OMPI with the visibility hidden (by adding the CFLAGS=-fvisibility=hidden) to the configure line. Unfortunately, using this flag is not possible in today's code anymore, because most of the submodules we are pulling in do not have proper visibility of their symbols. (For the sake of the rest of this comment, I used some tricks to generate the libopal shared library).

We can assess the visibility from nm output, if the attribute is upper case the symbol is externally available and if lower case it exists but it is local (scope limited to the existing shared library). Let me add a second, more clear, test: "Can I link against the OPAL library and use any json_* symbol?". Please notice this test uses two symbols, one opal_json_free provided by libopal and one, json_parse, provided by the json library we included.

#include <stdlib.h>

extern int json_parse();
extern int opal_json_free();

int main(int argc, char* argv[])
{
  json_parse(NULL);
  opal_json_free(NULL);
  return 0;
}
OSX (clang 15.0.0) CFLAGS=-fvisibility=hidden --enable-visibility --disable-visibility
visibility as discovered by configure #define OPAL_C_HAVE_VISIBILITY 1 #define OPAL_C_HAVE_VISIBILITY 1 #define OPAL_C_HAVE_VISIBILITY 0
#define OPAL_HAVE_ATTRIBUTE_VISIBILITY 1
is the symbol visible with nm t _json_parse
T _opal_json_free
t _json_parse
T _opal_json_free
t _json_parse
T _opal_json_free
Can I link against the tester No (missing _json_parse symbol) No (missing _json_parse symbol) No (missing _json_parse symbol)
Linux (gcc 11.4.1) CFLAGS=-fvisibility=hidden --enable-visibility --disable-visibility
visibility as discovered by configure #define OPAL_C_HAVE_VISIBILITY 1 #define OPAL_C_HAVE_VISIBILITY 1 #define OPAL_C_HAVE_VISIBILITY 0
is the symbol visible with nm t json_parse
T opal_json_free
t json_parse
T opal_json_free
t json_parse
T opal_json_free
Can I link against the tester No (missing json_parse symbol) No (missing json_parse symbol) No (missing json_parse symbol)

I think the conclusion is that this code is good to go as it despite all the ongoing discussions.

@bosilca
Copy link
Member

bosilca commented Jul 10, 2024

So we are all looking into this. So much wasted time !

I recall we were running a checker (at the end of debug builds or something) to make sure that all exported symbols are correctly prefixed ?

@bosilca
Copy link
Member

bosilca commented Jul 10, 2024

The reason we forcefully make some symbols externally visible has nothing to do with debugging. Without these symbols being visible our MCA components would not be able to access them when loaded at runtime.

@jsquyres
Copy link
Member

FWIW, even with the current visibility settings (with json_* as local and opal_json_* as global), I'm able to debug into the json_* functions:

(lldb) s
Process 68249 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x000000010062dd10 libopen-pal.0.dylib`json_parse(json="{\"foo\":3}", length=9) at json.c:1001:18
   998 	__attribute__((visibility ("hidden")))
   999 	json_value * json_parse (const json_char * json, size_t length)
   1000	{
-> 1001	   json_settings settings = { 0 };
   1002	   return json_parse_ex (&settings, json, length, 0);
   1003	}
   1004	
Target 0: (hello_c) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
  * frame #0: 0x000000010062dd10 libopen-pal.0.dylib`json_parse(json="{\"foo\":3}", length=9) at json.c:1001:18
    frame #1: 0x0000000100632878 libopen-pal.0.dylib`opal_json_load(str="{\"foo\":3}", len=9, json=0x000000016fdf9530) at opal_json.c:98:25
    frame #2: 0x00000001005e36e0 libopen-pal.0.dylib`opal_init(pargc=0x000000016fdfe0f8, pargv=0x000000016fdfe0f0) at opal_init.c:158:9
    frame #3: 0x0000000100c36be0 libmpi.0.dylib`ompi_rte_init(pargc=0x000000016fdfe0f8, pargv=0x000000016fdfe0f0) at ompi_rte.c:570:32
    frame #4: 0x0000000100c40050 libmpi.0.dylib`ompi_mpi_instance_init_common(argc=1, argv=0x000000016fdff148) at instance.c:401:32
    frame #5: 0x0000000100c3fac8 libmpi.0.dylib`ompi_mpi_instance_init(ts_level=0, info=0x00000001011371b8, errhandler=0x0000000101135920, instance=0x000000010113fb18, argc=1, argv=0x000000016fdff148) at instance.c:824:15
    frame #6: 0x0000000100c32124 libmpi.0.dylib`ompi_mpi_init(argc=1, argv=0x000000016fdff148, requested=0, provided=0x000000016fdfed50, reinit_ok=false) at ompi_mpi_init.c:359:11
    frame #7: 0x0000000100d918e0 libmpi.0.dylib`MPI_Init(argc=0x000000016fdfedd0, argv=0x000000016fdfedc8) at init.c:67:15
    frame #8: 0x0000000100003e84 hello_c`main(argc=1, argv=0x000000016fdff148) at hello_c.c:18:5
    frame #9: 0x000000018a55e0e0 dyld`start + 2360

So I think the current visibility stuff is giving us what we want.

The question as to whether to remove the compiler-specific visibility stuff is still relevant, but doesn't have to be part of this PR.

@wenduwan wenduwan requested a review from lrbison July 10, 2024 20:00
@wenduwan
Copy link
Contributor Author

I recall we were running a checker (at the end of debug builds or something) to make sure that all exported symbols are correctly prefixed ?

Do we used to have this? It will be very helpful.

@bosilca
Copy link
Member

bosilca commented Jul 10, 2024

@jsquyres helped me find it. It's in config/find_common_symbol and it is called at the end of Makefile.am but it does not do everything I said. But you can use something like this to get a list (remove the _ in front of the symbols on Linux) nm ompi/.libs/libmpi.dylib | grep ' [TWBCDVR] ' | grep -v -e ' _MPI_' -e ' _mpi_' -e ' _PMPI_' -e ' _MPIX_' -e ' _MPL_' -e ' _ADIOI_' -e ' _ADIO_' -e ' _ompi_' -e ' _OMPI_' -e ' _mca_'

@jsquyres
Copy link
Member

Do we used to have this? It will be very helpful.

@bosilca and I were talking about this in Slack today. This is also a separate issue than this PR, but we both agree that it would be a great CI test. Could be something as simple as:

nm ompi/.libs/libmpi.dylib | grep ' [TWBCDVR] ' | grep -v -e ' _MPI_' -e ' _mpi_' -e ' _PMPI_' -e ' _MPIX_' -e ' _MPL_' -e ' _ADIOI_' -e ' _ADIO_' -e ' _ompi_' -e ' _OMPI_' -e ' _mca_'

That was on OSX -- could be easily adapted for Linux. It could also easily be done in Python to be a bit more fine-grained for checking, etc.

The https://github.com/open-mpi/ompi/blob/main/config/find_common_syms script (which is called by the top-level Makefile.am) could be a good model for this.

Copy link
Contributor

@lrbison lrbison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments have been addressed. LGTM.

@wenduwan
Copy link
Contributor Author

@rhc54 I'm merging the change as-is. Will also explore the symbol validation suggested by George.

@wenduwan wenduwan merged commit 76458b0 into open-mpi:main Jul 10, 2024
13 checks passed
@wenduwan wenduwan deleted the opal_util_json branch July 10, 2024 20:37
@rhc54
Copy link
Contributor

rhc54 commented Jul 10, 2024

Sounds reasonable to me! FWIW: @jsquyres and I wrote a script a few years back (i.e., 10) to automatically prefix symbols for just these purposes. See https://github.com/open-mpi/ompi/blob/main/contrib/symbol-hiding.pl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants