forked from flux-framework/flux-core
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python/test: update to new tmpdir scheme #5
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Don't accept "local://" and find the default path from $FLUX_TMPDIR. Require the path. Rename the socket from flux-api to flux-local (cleanup).
Obtain the URI of the local socket by asking the broker with flux_getattr ("local-uri") instead of deriving it from FLUX_TMPDIR. Rename the socket from flux-api to flux-local (cleanup).
The broker's environment is set up by its enclosing instance. Save the value of FLUX_URI, and make it available via flux_getattr ("parent-uri"). Then unset FLUX_URI so this is not used accidentally. Eliminate FLUX_TMPDIR. When launching locally, use $TMPDIR:-/tmp for socket paths, and create a subdir structured as before for the local socket and broker pid. Repeated local launches from the inital program of the first do not create sockets in subdirectories of each other. They will be flat in $TMPDIR:-/tmp. However when launching recursively, use the parent-uri to create the subdir for the local socket and broker pid in a subdirectory of the parent. (Other sockets are wildcard paths shared via PMI so don't apply here). Set/override FLUX_URI for the initial program and cmb.exec. Make it available via flux_getattr ("local-uri"). Rename other attributes to avoid confsion between TBON hierarchy and instance hierarchy: parent-uri -> tbon-parent-uri request-uri -> tbon-request-uri
This just removes some attribute names in usage messages that are no longer correct, and probably shouldn't have been there anyway.
To determine whether to get config from the KVS or config file, test the FLUX_URI variable, not the deprecated FLUX_TMPDIR variable.
Drop the -T,--tmpdir argument. Test thet FLUX_URI variable not FLUX_TMPDIR to determine whether to load config from the KVS or file.
The wreck module sets FLUX_URI in the environment of wrexecd, and wrexecd overrides it in the environment of the program being launched. Obtain the value to use in the wreck module by calling flux_getattr ("local-uri").
Modify the "flux exec does not pass $FLUX_TMPDIR" test to use $FLUX_URI instead.
With FLUX_TMPDIR gone and "local://" no longer a valid URI for the local connector, several tests were just no longer relevant.
The parent-uri attribute was renamed to tbon-parent-uri for disambiguation. Update this user.
get_filtered_envronment() should filter FLUX_URI not FLUX_TMPDIR.
Create a top level, unique temporary directory to contain broker sockets. When an instance is launched directly by flux-start, the instance shares this directory and puts all of its ranks' sockets inside it. This means flux-start needs to create it and share it with each broker. The new --socket-directory option enables this. A simplified directory structure and naming results, e.g. /tmp/flux.5wv21M/event /tmp/flux.5wv21M/0/broker.pid /tmp/flux.5wv21M/0/local /tmp/flux.5wv21M/0/req /tmp/flux.5wv21M/1/broker.pid /tmp/flux.5wv21M/1/local /tmp/flux.5wv21M/1/req ... When an instance is being launched via slurm or flux, each rank creates its own unique temporary directory. In the cases where ranks need to find each others' ipc sockets (such as when an event relay is active), these URI's are shared via PMI so it is not necessary for them to be computed relative to a known directory.
When launching an instance directly, create a temporary directory, register it with the cleanup handler, then launch each broker with --socket-directory pointing to it. All the sockets and pidfiles for the session will be self-contained in this directory.
Now that the broker is creating its rank-specific directory and therefore its pidfile in a unique directory, simply fail out if this directory already exists and skip avoid both the pid liveness check and the --force option. This failure mode should be practically impossible now.
With the socket directory reorganization, sockets get shorter names, so rename the 'flux-local' socket to just 'local'.
Create a directory in tmp that looks like this flux-<sid>-XXXXXX where XXXXXX is a random component.
If the broker is creating the socket dir, it should look like this: flux-<sid>-XXXXXX where XXXXXX is a random component. The random component is necessary to avoid name collisions when instances from overlapping sid spaces are launched.
Match the new instance directory names: flux-sid-XXXXXX. Eliminate the --all and --top-only options as instance directories are no longer hierarchical.
FLUX_TMPDIR has been removed, this commit removes references to it from sideflux and repairs behavior of FLUX_URI.
garlick
added a commit
that referenced
this pull request
Sep 15, 2020
Problem: unloading resource module with events posted to eventlog in flight can resut in segfault. Program terminated with signal SIGSEGV, Segmentation fault. #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102 102 ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory. [Current thread is 1 (Thread 0x7fe74b7fe700 (LWP 3495430))] (gdb) bt #0 __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102 #1 0x00007fe764f40de0 in aux_item_find (key=<optimized out>, head=0x7fe73c006180) at aux.c:88 #2 aux_get (head=<optimized out>, key=0x7fe764f5b000 "flux::log") at aux.c:119 #3 0x00007fe764f1f0d4 in getctx (h=h@entry=0x7fe73c00c6d0) at flog.c:72 #4 0x00007fe764f1f3a5 in flux_vlog (h=0x7fe73c00c6d0, level=7, fmt=0x7fe7606318fc "%s: %s event posted", ap=ap@entry=0x7fe74b7fd790) at flog.c:146 #5 0x00007fe764f1f333 in flux_log (h=<optimized out>, lev=lev@entry=7, fmt=fmt@entry=0x7fe7606318fc "%s: %s event posted") at flog.c:195 flux-framework#6 0x00007fe76061166a in reslog_cb (reslog=<optimized out>, name=0x7fe73c016380 "online", arg=0x7fe73c013000) at acquire.c:319 flux-framework#7 0x00007fe760610deb in notify_callback (event=<optimized out>, reslog=0x7fe73c005b90) at reslog.c:47 flux-framework#8 post_handler (reslog=reslog@entry=0x7fe73c005b90, f=0x7fe73c00a510) at reslog.c:91 flux-framework#9 0x00007fe760611250 in reslog_destroy (reslog=0x7fe73c005b90) at reslog.c:182 flux-framework#10 0x00007fe76060e6b8 in resource_ctx_destroy (ctx=ctx@entry=0x7fe73c016640) at resource.c:129 flux-framework#11 0x00007fe76060ef18 in resource_ctx_destroy (ctx=0x7fe73c016640) at resource.c:331 It looks like the acquire subsystem got a callback for a rank coming online after its context was freed. Set the reslog callback to NULL before destroying the acquire context. Also, set the monitor callback to NULL before destroying the discover context, as it appears this destructor has a similar safety issue.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
FLUX_TMPDIR has been removed, this commit removes references to it from
sideflux and repairs behavior of FLUX_URI.