-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement KVS transaction object #1107
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1107 +/- ##
==========================================
- Coverage 78.34% 78.03% -0.32%
==========================================
Files 157 159 +2
Lines 26132 26232 +100
==========================================
- Hits 20474 20469 -5
- Misses 5658 5763 +105
|
339bc17
to
10d3961
Compare
10d3961
to
9b389d7
Compare
9b389d7
to
1096b57
Compare
At a high level, things look good to me. I think it was a good idea to document the json "null" vs C NULL, which can be confusing. |
Doing a bit of tidying up on this PR. Hopefully should be ready for another look by mid morning, then if no objection's I'll squash down the incremental changes and ask for merge consideration. |
OK, done with cleanup. This is ready for review IMHO. |
This test appears to be wrong - putting a null key is equivalent to an unlink, but the test fails when that actually happens. The test may have been relying on an earlier bug where the value of a key became the JSON string "null" instead of the json value null. Drop the test for now.
t1000-kvs-basic.t calls t/kvs/watch simulwatch <key> 8192 This is a stress test of sorts for the kvs_watch client code. Drop the count from 8192 to 256 to avoid a future problem at exactly 500 (not 499) concurrent watches of the same key where the test hangs. The problem begins occurring after the jansson rework of kvs_watch - modify the test here to avoid breaking git bisect. This may be an actual bug, but since kvs_watch() has significant other changes pending, and many simulateous watches is not a real use case anyway, leave it for another time.
Create jansson dirent, a duplicate of json_dirent but using jansson. This set of functions is used for conversion of json-c to jansson throughout flux.
1914b4e
to
33042b8
Compare
rebased and added one last test. |
Had one travis builder stall here:
Hmm. Restarting. |
Maybe it should be obvious, but perhaps the documentation for
Perhaps two suggestions to make this clearer:
(or similar) Also, how does this function compare to the "classic" call |
Those changes sound good, patch coming.
Identical except that |
Thanks. In a way the name of |
The only other question I'll pose is whether the |
Other than those minor questions, I did test out this PR on some of our TOSS2 and 3 systems and no gotchas so I'm willing to merge for you once you're ready. |
There wasn't any discussion that I recall. I sort of made that change on my own initiative, and I'm fine with changing it back. I find my thinking on interface design is often a bit warped when I've had my head in the implementation details. I'll go ahead and change that, and squash the incremental changes. Thanks for the testing and review! |
Actually, maybe for symmetry with |
That works for me if it is OK with you! |
Even with the above fix (which I think is valid, good catch), there are some other issues being caught by running the lua test
So sorry about this mess! |
Hm, the above might not be related.. it looks like some possible jansson refcounting issue though. |
Was rebasing on your branch to prepare my PR #1108. Noticed you axed |
9b2c239
to
d1c16c4
Compare
@grondo found the bug - tried to decref a json_t obtained with Just pushed the fix and dropped the proposed lua fix (opened #1112 for future consideration, in case it has merit on its own). I'll run the PMI test that was failing in a loop during our group meeting and if it doesn't fail in an hour, we're probably past that one. Whew! Thanks @grondo! |
@chu11: ok I'll drop the commit that removes that function if it's more convenient to leave it in for now. |
d1c16c4
to
8f77614
Compare
Rework the "write" side of the KVS API in terms of an explicit transaction object as discussed in flux-framework#1094. The interface is essentially - create txn - append ops to txn: put, pack, mkdir, unlink, symlink - commit/fence txn - destroy txn Also: rename the KVS_NOMERGE flag to FLUX_KVS_NOMERGE. Provide a private interface so that unit tests can examine internal contents of a commit.
Add unit test for kvs_txn. Also: ensure that unit tests link against the static libkvs.la rather than ../libflux.so so they pick up changes during development, and the unit test hook in kvs_txn.c can be kept private.
Drop the functions kvs_fence_set_context() kvs_fence_clear_context() These were added so that a program could build up a fence transaction while simultaneously performing other KVS commits. That use case is now handled through explicit transactions. Drop those functions and convert wrexecd and tests.
Reimplement kvs_put() and kvsdir_put() and associated functions in terms of txn's. Create a "default transaction" that these functions add to, and have kvs_commit() and kvs_fence() commit the default transaction. Update tests and wrexecd. Also: drop some unnecessary includes from kvs.c.
Replace calls to xstrdup() and xzmalloc() that exit on out of memory error with strdup() and calloc(), and return ENOMEM errors to the caller if they occur. Drop some internal #includes that were there for no apparent reason.
Convert kvs_watch() internals from json-c to jansson. This conversion may have introduced a bug that causes t/kvs/watch simulwatch with ntimes > 499 to hang. For code simplification, one new thing here is that each watch is assigned a future. After the first response, the future is fulfilled, but is kept around to prevent the matchtag from being freed. This simplified the code for performing the initial RPC, but does increase the amount of state kept per watch. We really need a more well thought out way to handle the "one request - multiple response" use case. After the first response that fulfills the future, a generic response message handler is used for subsequent responses.
A flux_kvs_txn_put (key, "null") should result in an EINVAL. The t/kvs/basic put subcommand retries an EINVAL with the value converted to a string. Add a test that ensures the value is set to a string rather than a JSON null, which is an invalid directory entry.
Avoid using proto.c::kp_twatch_enc(), and instead use flux_mrpcf() to encode the mrpc request. This prepares for removal of proto.[ch] in a future commit. Also, drop remaining json-c use in this test, decoding mrpc stats response using flux_mrpc_getf().
I dropped the commit @chu11 wanted dropped, and verified the PMI test has been running in a loop for about 45m with no failures (where before it usually failed within a few minutes). I think this one's perhaps ready? |
Yeah, looks good to me! |
This PR implements the proposed
flux_kvs_txn_t
objects for the KVS discussed in #1094:As discussed in #1094 we'll want some convenience functions since having to create/put/commit/destroy to change a single value is a bit cumbersome. For now the old interfaces are still there, rewritten in terms of the above.
This PR also internally converts the KVS client API to use jansson.