Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connector/local: add exponential backoff to connect retry count #1148

Merged
merged 4 commits into from
Aug 13, 2017

Conversation

grondo
Copy link
Contributor

@grondo grondo commented Aug 10, 2017

As @garlick suggested in #677, add an exponential backoff to local connector connect retries, rename the variable to tune the retry count from FLUX_RETRY_COUNT to FLUX_LOCAL_CONNECTOR_RETRY_COUNT, and use this variable in t5000-valgrind.t to give up to 10 retries.

I tried to make the the default retry count of 5 still time out after ~500ms, so I made the starting sleep 16ms, which should sleep a total of 496ms in the default cause -- unless in my haste I did some math wrong.

grondo added 4 commits August 10, 2017 21:03
When the spellcheck tests are skipped (due to missing aspell for
example), the skip TAP output is incorrect, or at least is deemed
incorrect by the tap-driver.sh automake TAP driver, which causes
the test to fail. Change the output to `1..0 # skip` to make
the test driver happy.
Rename local connector retry count from FLUX_RETRY_COUNT to
FLUX_LOCAL_CONNECTOR_RETRY_COUNT ass suggested in issue flux-framework#677.
Introduce an exponential backoff to the local connector's attempts
to connect to the flux instance api domain socket. The backoff starts
at 16ms and doubles up to 2s. This is so that the default retry count
of 5 still times out after about 500ms as in the current code.
Allow more retries in flux_open in local connector as suggested in flux-framework#677.
@grondo
Copy link
Contributor Author

grondo commented Aug 10, 2017

Oh, oops I also added a fix for the spellcheck test, which isn't producing the skip TAP output correctly (at least according to tap-driver.sh. I can move that to another PR if needed.

@codecov-io
Copy link

codecov-io commented Aug 10, 2017

Codecov Report

Merging #1148 into master will decrease coverage by 0.05%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1148      +/-   ##
==========================================
- Coverage   77.62%   77.56%   -0.06%     
==========================================
  Files         157      157              
  Lines       28680    28694      +14     
==========================================
- Hits        22262    22257       -5     
- Misses       6418     6437      +19
Impacted Files Coverage Δ
src/connectors/local/local.c 87.4% <100%> (-0.19%) ⬇️
src/common/libflux/info.c 75% <0%> (-2.78%) ⬇️
src/common/libcompat/info.c 65.82% <0%> (-2.54%) ⬇️
src/common/libflux/ev_flux.c 97.56% <0%> (-2.44%) ⬇️
src/modules/connector-local/local.c 74.08% <0%> (-1.47%) ⬇️
src/broker/modservice.c 79.61% <0%> (-0.98%) ⬇️
src/common/libkvs/kvs_watch.c 86.95% <0%> (-0.49%) ⬇️
src/common/libflux/dispatch.c 85.18% <0%> (-0.38%) ⬇️
src/common/libflux/reactor.c 93.18% <0%> (-0.29%) ⬇️
... and 9 more

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.05%) to 77.92% when pulling 56b29b8 on grondo:t5000-valgrind-fix into 1665912 on flux-framework:master.

@garlick
Copy link
Member

garlick commented Aug 13, 2017

Looks great, thanks! IMHO no big deal having the doc test fix in here also.

@garlick garlick merged commit aeb6cfd into flux-framework:master Aug 13, 2017
@grondo grondo mentioned this pull request Aug 23, 2017
@grondo grondo deleted the t5000-valgrind-fix branch October 25, 2017 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants