-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libkz: defer kvs_watch until necessary #1424
Conversation
Oops, here's a test failure in
|
I made It's not clear to me looking at the iowatcher implementation whether Then the test will have to start the reactor and check for error in the reactor callback. |
Codecov Report
@@ Coverage Diff @@
## master #1424 +/- ##
=========================================
+ Coverage 78.59% 78.6% +<.01%
=========================================
Files 163 163
Lines 30082 30105 +23
=========================================
+ Hits 23643 23663 +20
- Misses 6439 6442 +3
|
Sorry you had to deal with that @garlick. Your approach seems good to me |
NP! |
This looks really good! How hard would it be to have an option that says "even if it isn't done, don't establish the watchers, just stop" with this change in? |
Probably not hard. I'll look at that. |
@trws if you have the output of one of these completed jobs in a KVS somewhere, it would be interesting to know before and after numbers on |
I have an instance up that’s run 18 jobs, up to 4000 tasks each.
Several have non-trivial output, would it be useful to pull something
out of there?
…On 4 Apr 2018, at 16:14, Jim Garlick wrote:
@trws if you have the output of one of these completed jobs in a KVS
somewhere, it would be interesting to know before and after numbers on
`flux wreck attach`.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1424 (comment)
|
Sure - if it's not too much trouble, just time flux-wreck attach on master and this branch on the same (completed) job. |
Just gave changes to kz iowatcher a quick look and it seems good to me. Just to be clear, new iowatcher callback denotes EOF when both Just another warning that the fd iowatcher has completely different semantics (e.g. Sorry that is all I have time for tonight. |
Thanks @grondo! Well that sounds right, or (what I was thinking was) you'd first check |
Well, the timing results are pretty conclusive. With this, over ssh, attach on a 400-line output job with 400 ranks takes 0.7 seconds. The current master version has so far been running for five minutes, and I just got another line out of it, so it might eventually finish? The last two lines were a minute each in coming. For one that actually finished reasonably, the current master took about 1.5 seconds on average, this took about 0.5 (again with this version at the disadvantage of connecting over ssh). In a possibly related note, do either of you know if it's possible we leave watchers around for cancelled or killed |
In that case where the reactor has gone away and the broker connection dropped, the KVS should get a disconnect message and take care of cleaning out any state for that connection. To get a count of service side watcher state for the local rank, run:
FWIW there is a regression test for this: |
Problem: kz_gopen() is not used but its implementation contributes to complexity. Drop this interface and associated logic.
Problem: code to kvs_lookup next key with continuation is duplicated in continuation and watch callback, and soon in another function. Factor out common code to lookup_next() Also, drop some LOG_DEBUG messages and add sequence number to some LOG_ERR messages to make them more informative.
Problem: KVS lookups continue without incrementing kz->seq if callback is unregistered with with kz->seq < kz->last_dir_size resulting in infinite loop. Terminate lookups if callback is NULL in the lookup continuation.
Prepare for use of kz->eof flag by ensure it is set when end of stream is reached, even if data is being consumed in "raw" mode with kz_get_json.
If internal kz->eof flag is set, terminate KVS watcher.
Since KVS watches are costly, avoid setting one up until any existing data in the stream has been read and consumed. If EOF is reached, then the KVS watch is avoided entirely.
Problem: zio decode errors were ignored in getnext_blocking(). Handle that error.
Problem: errors that occur in context of a KVS continuation in the read path were not saved so they could be returned to the user on the next kz_get(), kz_get_json(), or kz_close() call. Check for saved error in kz_get(), kz_get_json(), and kz_close(). When a continuation encounters an error, save it, and call the user's callback. A couple of inappropriate log messages were removed also and some error paths further cleaned up.
Problem: many of the public kz functions will crash if passed NULL arguments. Add checks for absurd argument values.
Problem: if an iowatcher internally gets a kz_get() error, it has no way to inform the user. Add an err argument to the iowatcher callback that is nil when there is no error, and set to an error string when there is one.
Problem: lua iowatcher test 5 'iowatcher returns error correctly' fails after kvs_watch deferral change. This test attempts to create an iowatcher on a non-directory and expects iowatcher creation to fail. Now that the the lookup is no longer performed in the context of iowatcher creation, creation succeeds. Change the test's iowatcher callback to stop the reactor if 'err' is non-nil. Then change he test to run the reactor and expect an ENOTDIR failure.
Problem: kzutil --attach is not used by any tests. Drop the attach code which is hard wired for wreck's stdio naming convention, and is not used by any test. This leaves kzutil --copy as the main mode, so drop the --copy option and rename the utility to kzcopy. Update users.
Problem: sharness test driver for kzcopy explicitly sets blocksize for I/O to 4096 which is the default. Drop the -b 4096 option from kzcopy calls.
Problem: test coverage is thin for KZ_FLAGS_NONBLOCK. Add a kzcopy --non-blocking option that uses the kz ready_cb_f callbacks to read a kz stream. Add a couple more sharness tests to exercise this option.
Problem: kz EINVAL checks have no test coverage. Add a TAP test to cover invalid arguments. (Unfortunately not a lot can be covered here because kz requires a broker connection and working KVS to get very far).
Rebased, improved commit messages somewhat, and squashed some incremental development. |
We are now up to 9 hours and 46 minutes. The old version has printed 366 lines. |
Agreed, I can even hack something in for splash short term. The stat command is telling me there are 740 active kvs watches. The big flux instance is currently almost unusable, and I'm trying to figure out why. It's running, it's responding, but its spending all of its time in the kvs module processing transactions. The reason this surprises me is that long-running attach and a wreck ls are the only things running, and it's pegged at 100% of one CPU continually. |
@chu11, would you mind having a quick look at this and press the button if you find nothing amiss? The test coverage is somewhat improved from where it was, but commensurate with how much effort I think we want to put into this old stuff (in other words, not quite par). I think I'm OK with that. |
Ping @chu11? |
As discussed in #1420, this PR changes libkz so that the
kz_ready_f
registration does not immediately trigger akvs_watch()
on the stream directory. Instead, it uses chained KVS lookups and continuations to iterate over the stream until the available "blocks" are exhausted. At that point if EOF is not reached, akvs_watch()
is installed as before. However if EOF is reached, thekvs_watch()
is entirely avoided.This should allow
flux wreck attach
to process the output of a completed job without ever usingkvs_watch()
, provided that each stream has reached EOF (which might not be the case until we fix #1419). Hopefully this is more scalable.It might also change the timing for the "redirected" case since the
kvs_watch()
calls (with embedded synchronous RPC) will only occur during I/O processing, not during initialization, and possibly more staggered in time.I haven't tested this yet at scale but it seems to be working in my small tests. Will try to do some scale testing tomorrow.