Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libkz: defer kvs_watch until necessary #1424

Merged
merged 15 commits into from
Apr 6, 2018

Conversation

garlick
Copy link
Member

@garlick garlick commented Apr 4, 2018

As discussed in #1420, this PR changes libkz so that the kz_ready_f registration does not immediately trigger a kvs_watch() on the stream directory. Instead, it uses chained KVS lookups and continuations to iterate over the stream until the available "blocks" are exhausted. At that point if EOF is not reached, a kvs_watch() is installed as before. However if EOF is reached, the kvs_watch() is entirely avoided.

This should allow flux wreck attach to process the output of a completed job without ever using kvs_watch(), provided that each stream has reached EOF (which might not be the case until we fix #1419). Hopefully this is more scalable.

It might also change the timing for the "redirected" case since the kvs_watch() calls (with embedded synchronous RPC) will only occur during I/O processing, not during initialization, and possibly more staggered in time.

I haven't tested this yet at scale but it seems to be working in my small tests. Will try to do some scale testing tomorrow.

@garlick
Copy link
Member Author

garlick commented Apr 4, 2018

Oops, here's a test failure in lua/t1003-iowatcher.t I need to run down:

not ok 4 - iowatcher returns error correctly
      got: userdata: 0x9370a94
 expected: nil

@coveralls
Copy link

coveralls commented Apr 4, 2018

Coverage Status

Coverage increased (+0.002%) to 78.902% when pulling 4ea96cd on garlick:libkz_nowatch into af89104 on flux-framework:master.

@garlick
Copy link
Member Author

garlick commented Apr 4, 2018

I made lua/t1003-iowatcher.t a todo for the moment. The test tries to create an I/O watcher watching a non-directory and expects the iowatcher creation to fail. That used to happen at kz_set_ready_cb() time when flux_kvs_watch() was called, which caused a synchronous lookup to occur. Now that the initial lookup result is handled in a continuation, the users callback will be called, and when it calls kz_get(), that will fail.

It's not clear to me looking at the iowatcher implementation whether kz_get() errors are checked. If not, that will need to be fixed and may help us find other reader issues.

Then the test will have to start the reactor and check for error in the reactor callback.

@codecov-io
Copy link

codecov-io commented Apr 4, 2018

Codecov Report

Merging #1424 into master will increase coverage by <.01%.
The diff coverage is 75.47%.

@@            Coverage Diff            @@
##           master   #1424      +/-   ##
=========================================
+ Coverage   78.59%   78.6%   +<.01%     
=========================================
  Files         163     163              
  Lines       30082   30105      +23     
=========================================
+ Hits        23643   23663      +20     
- Misses       6439    6442       +3
Impacted Files Coverage Δ
src/common/libkz/kz.c 82.55% <73.95%> (+12.02%) ⬆️
src/bindings/lua/flux-lua.c 80.97% <90%> (-0.31%) ⬇️
src/cmd/cmdhelp.c 79.13% <0%> (-5.22%) ⬇️
src/modules/cron/cron.c 76.68% <0%> (-3.04%) ⬇️
src/common/libkvs/kvs_watch.c 90.12% <0%> (-1.29%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/broker/modservice.c 79.61% <0%> (-0.98%) ⬇️
src/broker/module.c 83.79% <0%> (-0.28%) ⬇️
src/common/libflux/message.c 81.6% <0%> (+0.11%) ⬆️
... and 4 more

@grondo
Copy link
Contributor

grondo commented Apr 4, 2018

Sorry you had to deal with that @garlick. Your approach seems good to me

@garlick
Copy link
Member Author

garlick commented Apr 4, 2018

NP!

@trws
Copy link
Member

trws commented Apr 4, 2018

This looks really good! How hard would it be to have an option that says "even if it isn't done, don't establish the watchers, just stop" with this change in?

@garlick
Copy link
Member Author

garlick commented Apr 4, 2018

This looks really good! How hard would it be to have an option that says "even if it isn't done, don't establish the watchers, just stop" with this change in?

Probably not hard. I'll look at that.

@garlick
Copy link
Member Author

garlick commented Apr 4, 2018

@trws if you have the output of one of these completed jobs in a KVS somewhere, it would be interesting to know before and after numbers on flux wreck attach.

@trws
Copy link
Member

trws commented Apr 4, 2018 via email

@garlick
Copy link
Member Author

garlick commented Apr 4, 2018

I have an instance up that’s run 18 jobs, up to 4000 tasks each.
Several have non-trivial output, would it be useful to pull something
out of there?

Sure - if it's not too much trouble, just time flux-wreck attach on master and this branch on the same (completed) job.

@grondo
Copy link
Contributor

grondo commented Apr 5, 2018

Just gave changes to kz iowatcher a quick look and it seems good to me. Just to be clear, new iowatcher callback denotes EOF when both data and err are nil, otherwise if err is not nil, then an error occurred? (I know sorry about the bad semantics of the original function.)

Just another warning that the fd iowatcher has completely different semantics (e.g. data parameter passed to callback is a Lua table, not just a string, etc.) This is because it was based on the old zio code, so I didn't want to bother fixing it yet. Just didn't want to you to be surprised if you went looking there (and hopefully you don't have to.)

Sorry that is all I have time for tonight.

@garlick
Copy link
Member Author

garlick commented Apr 5, 2018

Thanks @grondo!

Well that sounds right, or (what I was thinking was) you'd first check err and if non-nil, it's an error otherwise, proceed as before.

@trws
Copy link
Member

trws commented Apr 5, 2018

Well, the timing results are pretty conclusive. With this, over ssh, attach on a 400-line output job with 400 ranks takes 0.7 seconds. The current master version has so far been running for five minutes, and I just got another line out of it, so it might eventually finish? The last two lines were a minute each in coming.

For one that actually finished reasonably, the current master took about 1.5 seconds on average, this took about 0.5 (again with this version at the disadvantage of connecting over ssh).

In a possibly related note, do either of you know if it's possible we leave watchers around for cancelled or killed wreck attach commands? I've noticed long-running instances start to get slow, spending time processing kvs events even when the kvs isn't being actively poked from outside. Made me wonder if we're somehow leaking watchers when a client dies an untimely death or something (but then again, I'm way past loopy at this point so... hey, another line!

@garlick
Copy link
Member Author

garlick commented Apr 5, 2018

do either of you know if it's possible we leave watchers around for cancelled or killed wreck attach commands?

In that case where the reactor has gone away and the broker connection dropped, the KVS should get a disconnect message and take care of cleaning out any state for that connection. To get a count of service side watcher state for the local rank, run:

flux module stats --parse "namespace.primary.#watchers" kvs

FWIW there is a regression test for this: t/t1103-apidisconnect.t

garlick added 13 commits April 5, 2018 08:22
Problem: kz_gopen() is not used but its implementation
contributes to complexity.

Drop this interface and associated logic.
Problem: code to kvs_lookup next key with continuation
is duplicated in continuation and watch callback,
and soon in another function.

Factor out common code to lookup_next()

Also, drop some LOG_DEBUG messages and add sequence
number to some LOG_ERR messages to make them more
informative.
Problem: KVS lookups continue without incrementing kz->seq
if callback is unregistered with with kz->seq < kz->last_dir_size
resulting in infinite loop.

Terminate lookups if callback is NULL in the lookup continuation.
Prepare for use of kz->eof flag by ensure it is set
when end of stream is reached, even if data is being
consumed in "raw" mode with kz_get_json.
If internal kz->eof flag is set, terminate KVS watcher.
Since KVS watches are costly, avoid setting one up
until any existing data in the stream has been
read and consumed.  If EOF is reached, then the
KVS watch is avoided entirely.
Problem: zio decode errors were ignored in getnext_blocking().

Handle that error.
Problem: errors that occur in context of a KVS continuation
in the read path were not saved so they could be returned to
the user on the next kz_get(), kz_get_json(), or kz_close() call.

Check for saved error in kz_get(), kz_get_json(), and kz_close().
When a continuation encounters an error, save it, and call the
user's callback.

A couple of inappropriate log messages were removed also
and some error paths further cleaned up.
Problem: many of the public kz functions will crash
if passed NULL arguments.

Add checks for absurd argument values.
Problem: if an iowatcher internally gets a kz_get()
error, it has no way to inform the user.

Add an err argument to the iowatcher callback that is nil
when there is no error, and set to an error string when
there is one.
Problem: lua iowatcher test 5 'iowatcher returns error
correctly' fails after kvs_watch deferral change.

This test attempts to create an iowatcher on a non-directory
and expects iowatcher creation to fail.  Now that the
the lookup is no longer performed in the context of
iowatcher creation, creation succeeds.

Change the test's iowatcher callback to stop the reactor
if 'err' is non-nil.  Then change he test to run the reactor
and expect an ENOTDIR failure.
Problem: kzutil --attach is not used by any tests.

Drop the attach code which is hard wired for wreck's
stdio naming convention, and is not used by any test.
This leaves kzutil --copy as the main mode, so drop
the --copy option and rename the utility to kzcopy.

Update users.
Problem: sharness test driver for kzcopy explicitly sets
blocksize for I/O to 4096 which is the default.

Drop the -b 4096 option from kzcopy calls.
garlick added 2 commits April 5, 2018 08:41
Problem: test coverage is thin for KZ_FLAGS_NONBLOCK.

Add a kzcopy --non-blocking option that uses the kz
ready_cb_f callbacks to read a kz stream.

Add a couple more sharness tests to exercise this option.
Problem: kz EINVAL checks have no test coverage.

Add a TAP test to cover invalid arguments.

(Unfortunately not a lot can be covered here because
kz requires a broker connection and working KVS to
get very far).
@garlick
Copy link
Member Author

garlick commented Apr 5, 2018

Rebased, improved commit messages somewhat, and squashed some incremental development.
I'd vote that this goes in now and further improvements (such as a flag to avoid "following" the stream when EOF was not written) be deferred to another PR.

@trws
Copy link
Member

trws commented Apr 5, 2018

We are now up to 9 hours and 46 minutes. The old version has printed 366 lines.

@trws
Copy link
Member

trws commented Apr 5, 2018

Agreed, I can even hack something in for splash short term.

The stat command is telling me there are 740 active kvs watches. The big flux instance is currently almost unusable, and I'm trying to figure out why. It's running, it's responding, but its spending all of its time in the kvs module processing transactions. The reason this surprises me is that long-running attach and a wreck ls are the only things running, and it's pegged at 100% of one CPU continually.

@garlick
Copy link
Member Author

garlick commented Apr 5, 2018

@chu11, would you mind having a quick look at this and press the button if you find nothing amiss?

The test coverage is somewhat improved from where it was, but commensurate with how much effort I think we want to put into this old stuff (in other words, not quite par). I think I'm OK with that.

@garlick
Copy link
Member Author

garlick commented Apr 6, 2018

Ping @chu11?

@chu11 chu11 merged commit 95f8d8f into flux-framework:master Apr 6, 2018
@garlick garlick deleted the libkz_nowatch branch April 6, 2018 21:25
@grondo grondo mentioned this pull request May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[splash] wreck loses stdio output on large jobs
6 participants