Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not ok 50 - instance can stop cleanly with subscribers (#1025) #1077

Closed
garlick opened this issue May 23, 2017 · 4 comments
Closed

not ok 50 - instance can stop cleanly with subscribers (#1025) #1077

garlick opened this issue May 23, 2017 · 4 comments

Comments

@garlick
Copy link
Member

garlick commented May 23, 2017

Hit this in travis under clang/caliper builder. it would appear to be a racy test - if the nohup returns before the subscribe succeeds, the broker will be in the process of shutting down when the request is received.

expecting success: 
	flux start ${BUG1006} -s2 --bootstrap=selfpmi bash -c "nohup flux event sub hb &"

lt-flux-broker: module 'aggregator' was not cleanly shutdown
lt-flux-broker: module 'resource-hwloc' was not cleanly shutdown
lt-flux-broker: module 'kvs' was not cleanly shutdown
lt-flux-broker: module 'job' was not cleanly shutdown
lt-flux-broker: module 'barrier' was not cleanly shutdown
flux-event: flux_event_subscribe: Protocol error
flux-start: 0 (pid 93219) Killed
not ok 50 - instance can stop cleanly with subscribers (#1025)

I didn't immediately see how to address this so opening bug as placeholder.

This must be fairly rare.

@grondo
Copy link
Contributor

grondo commented May 23, 2017

Do you just need a command to call flux_event_subscribe without unsubscribe, or does it have to stay connected to API to avoid disconnect message?

Hm, either way, would something like this work?

 flux start -o,--heartrate=0.1 -s 2 bash -c '(flux event sub hb &) | read'

i.e., the script will only exit after flux event sub emits one line of data proving subscribe has finished.
Since hb event is use, we decrease the hb interval to something small to get a quick response.

Alternately you could replace flux event sub hb with a small lua snippet that subscribes and prints OK after subscribe then sleeps. This would avoid the need for adjusting hb interval.

@garlick
Copy link
Member Author

garlick commented Aug 29, 2017

Hit this again

ok 51 - scripts/waitfile works after 1s
scripts/waitfile works after 1s

expecting success: 
	flux start ${ARGS} -s2 --bootstrap=selfpmi bash -c "nohup flux event sub hb &"

lt-flux-broker: module 'kvs' was not cleanly shutdown
lt-flux-broker: module 'barrier' was not cleanly shutdown
lt-flux-broker: module 'aggregator' was not cleanly shutdown
lt-flux-broker: module 'resource-hwloc' was not cleanly shutdown
lt-flux-broker: module 'job' was not cleanly shutdown
flux-event: flux_event_subscribe: Success
flux-start: 0 (pid 12267) Killed
not ok 52 - instance can stop cleanly with subscribers (#1025)
[snip]
2017-08-29T22:18:42.922387Z broker.debug[0]: insmod connector-local
2017-08-29T22:18:42.922451Z broker.info[0]: wireup: 1/1 (complete) 0.0s
2017-08-29T22:18:42.922465Z broker.info[0]: Run level 1 starting
2017-08-29T22:18:42.975894Z broker.debug[0]: insmod barrier
2017-08-29T22:18:43.024114Z broker.debug[0]: insmod content-sqlite
2017-08-29T22:18:43.026666Z broker.debug[0]: content backing store: enabled content-sqlite
2017-08-29T22:18:43.075043Z broker.debug[0]: insmod kvs
2017-08-29T22:18:43.152445Z broker.debug[0]: insmod aggregator
2017-08-29T22:18:43.210591Z broker.debug[0]: insmod job
2017-08-29T22:18:43.261912Z broker.debug[0]: insmod cron
2017-08-29T22:18:43.275984Z broker.debug[0]: insmod resource-hwloc
2017-08-29T22:18:43.268917Z cron.info[0]: synchronizing cron tasks to event hb
2017-08-29T22:18:43.294222Z resource-hwloc.debug[0]: loaded
2017-08-29T22:18:43.328950Z broker.debug[0]: insmod userdb
2017-08-29T22:18:43.331673Z broker.info[0]: Run level 1 Exited (rc=0) 0.4s
2017-08-29T22:18:43.331697Z broker.info[0]: Run level 2 starting
2017-08-29T22:18:43.357807Z broker.info[0]: Run level 2 Exited (rc=0) 0.0s
2017-08-29T22:18:43.357916Z broker.info[0]: Run level 3 starting
2017-08-29T22:18:43.359522Z broker.info[0]: shutdown in 1.000s: run level 2 Exited
2017-08-29T22:18:43.358370Z content-sqlite.debug[0]: broker shutdown in progress
2017-08-29T22:18:43.417445Z broker.debug[0]: rmmod userdb
2017-08-29T22:18:43.417500Z broker.debug[0]: module userdb exited
2017-08-29T22:18:43.450461Z broker.debug[0]: rmmod cron
2017-08-29T22:18:43.450757Z broker.debug[0]: module cron exited
2017-08-29T22:18:43.488160Z broker.debug[0]: rmmod job
2017-08-29T22:18:43.488250Z job.debug[0]: got request job.shutdown
2017-08-29T22:18:43.488391Z broker.debug[0]: module job exited
2017-08-29T22:18:43.521352Z broker.debug[0]: rmmod resource-hwloc
2017-08-29T22:18:43.521448Z broker.debug[0]: module resource-hwloc exited
2017-08-29T22:18:43.556169Z broker.debug[0]: rmmod aggregator
2017-08-29T22:18:43.556344Z broker.debug[0]: module aggregator exited
2017-08-29T22:18:43.590229Z broker.debug[0]: rmmod kvs
2017-08-29T22:18:43.590449Z broker.debug[0]: module kvs exited
2017-08-29T22:18:43.623545Z broker.debug[0]: rmmod barrier
2017-08-29T22:18:43.623606Z broker.debug[0]: module barrier exited
2017-08-29T22:18:43.655524Z broker.debug[0]: rmmod content-sqlite
2017-08-29T22:18:43.655546Z content-sqlite.debug[0]: shutdown: begin
2017-08-29T22:18:43.655624Z broker.debug[0]: content backing store: disabled content-sqlite
2017-08-29T22:18:43.655678Z content-sqlite.debug[0]: shutdown: instance is terminating, don't reload to cache
2017-08-29T22:18:43.655772Z broker.debug[0]: module content-sqlite exited
2017-08-29T22:18:43.656704Z broker.info[0]: Run level 3 Exited (rc=0) 0.3s
2017-08-29T22:18:44.358251Z broker.debug[0]: rmmod connector-local

@chu11
Copy link
Member

chu11 commented Dec 16, 2017

I've hit this twice in #1299. Wondering if I've got something that hits the race the right way.

@garlick
Copy link
Member Author

garlick commented Aug 17, 2022

Not seen in a while.

@garlick garlick closed this as completed Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants