Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux-start: Eats cmdlines #11

Closed
grondo opened this issue Sep 30, 2014 · 6 comments
Closed

flux-start: Eats cmdlines #11

grondo opened this issue Sep 30, 2014 · 6 comments
Labels

Comments

@grondo
Copy link
Contributor

grondo commented Sep 30, 2014

It should be possible, and will be useful for the test suite, to preserve command line given to 'flux start' as much as possible. Currently, 'flux start' tries to concatenate args and thus drops quoting:

$ ./flux start -v --size=2 sh -c 'sleep 1'
flux-start: 0: ../broker/cmbd --size=2 --rank=0 --command=sh -c sleep 1
flux-start: 1: ../broker/cmbd --size=2 --rank=1
cmbd: 0-0: starting shell
sleep: missing operand
Try `sleep --help' for more information.
cmbd: 0: shutdown in 2s: shell (pid 7029) exited with rc=1
flux-start: 1 (pid 7022) exited with rc=1
flux-start: 0 (pid 7021) exited with rc=1
@grondo
Copy link
Contributor Author

grondo commented Sep 30, 2014

I didn't realize cmbd always invoked --command under the shell. Given that, for the above case at least, the following will work:


$ ./flux start -v --size=2 "sleep 1; ./flux up"
flux-start: 0: ../broker/cmbd --size=2 --rank=0 --command=sleep 1; ./flux up
flux-start: 1: ../broker/cmbd --size=2 --rank=1
cmbd: 0-0: starting shell
ok:     [0-1]
slow:   
fail:   
unknown:
cmbd: 0: shutdown in 2s: shell (pid 19271) exited with rc=0
flux-start: 1 (pid 19264) exited normally
flux-start: 0 (pid 19263) exited normally

@garlick
Copy link
Member

garlick commented Oct 1, 2014

I should probably let the command occupy the cmbd's leftover args rather than cram it into -c "command".

Right now the leftover args are module options. The whole module option passing thing is a mess. Maybe 'flux config' could make it easier to pass complex arguments between flux-start and cmbd, and leave cmbd positioned for being launched with a static config file at boot time.

@grondo
Copy link
Contributor Author

grondo commented Oct 1, 2014

I would keep this issue at low priority for now. Now that I understand "command" is run via sh -c
many things we'd like to do for testing will be possible, and for production use it is unlikely that
flux start will get common usage.

However, I do think the static config is a good option for cmdb.

Actually awhile ago I was thinking
it would be cool if we had an init type script for cmdb that could be passed on the command line
with methods for setting local config, waiting for events (kvs module loaded), and perhaps finally
launching a command. This sounds like a lot of work, but it might also be a way to address #8?

@trws
Copy link
Member

trws commented Apr 27, 2015

This has been fixed has it not?

@grondo
Copy link
Contributor Author

grondo commented Sep 24, 2015

Yeah, I think we can close this one now ;-)

@grondo grondo closed this as completed Sep 24, 2015
@trws
Copy link
Member

trws commented Sep 24, 2015

Oddly, we have one much like this on broker now under some odd corner cases... if I can reproduce it I'll open another bug, but this one is ready to be gone. =)

garlick added a commit to garlick/flux-core that referenced this issue Sep 15, 2020
Problem: unloading resource module with events posted to eventlog
in flight can resut in segfault.

Program terminated with signal SIGSEGV, Segmentation fault.

 #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
 102     ../sysdeps/x86_64/multiarch/strcmp-avx2.S: No such file or directory.
 [Current thread is 1 (Thread 0x7fe74b7fe700 (LWP 3495430))]
 (gdb) bt
 #0  __strcmp_avx2 () at ../sysdeps/x86_64/multiarch/strcmp-avx2.S:102
 #1  0x00007fe764f40de0 in aux_item_find (key=<optimized out>,
     head=0x7fe73c006180) at aux.c:88
 #2  aux_get (head=<optimized out>, key=0x7fe764f5b000 "flux::log") at aux.c:119
 #3  0x00007fe764f1f0d4 in getctx (h=h@entry=0x7fe73c00c6d0) at flog.c:72
 #4  0x00007fe764f1f3a5 in flux_vlog (h=0x7fe73c00c6d0, level=7,
     fmt=0x7fe7606318fc "%s: %s event posted", ap=ap@entry=0x7fe74b7fd790)
     at flog.c:146
 #5  0x00007fe764f1f333 in flux_log (h=<optimized out>, lev=lev@entry=7,
    fmt=fmt@entry=0x7fe7606318fc "%s: %s event posted") at flog.c:195
 flux-framework#6  0x00007fe76061166a in reslog_cb (reslog=<optimized out>,
     name=0x7fe73c016380 "online", arg=0x7fe73c013000) at acquire.c:319
 flux-framework#7  0x00007fe760610deb in notify_callback (event=<optimized out>,
     reslog=0x7fe73c005b90) at reslog.c:47
 flux-framework#8  post_handler (reslog=reslog@entry=0x7fe73c005b90, f=0x7fe73c00a510)
     at reslog.c:91
 flux-framework#9  0x00007fe760611250 in reslog_destroy (reslog=0x7fe73c005b90)
     at reslog.c:182
 flux-framework#10 0x00007fe76060e6b8 in resource_ctx_destroy (ctx=ctx@entry=0x7fe73c016640)
     at resource.c:129
 flux-framework#11 0x00007fe76060ef18 in resource_ctx_destroy (ctx=0x7fe73c016640)
     at resource.c:331

It looks like the acquire subsystem got a callback for a rank coming online
after its context was freed.  Set the reslog callback to NULL before
destroying the acquire context.

Also, set the monitor callback to NULL before destroying the discover
context, as it appears this destructor has a similar safety issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants