-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs: rework internal metadata format to conform to RFC 11 design #1154
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1154 +/- ##
==========================================
+ Coverage 77.56% 77.73% +0.16%
==========================================
Files 158 158
Lines 28758 29074 +316
==========================================
+ Hits 22307 22601 +294
- Misses 6451 6473 +22
|
Oh yeah, it's probably worth mentioning, soak hasn't been run yet and should. |
Got a valgrind error on one run. Not entirely sure what happened.
|
Excellent. I'll have a detailed look tomorrow! Thanks!
…On Aug 18, 2017 4:49 PM, "Al Chu" ***@***.***> wrote:
Got a valgrind error on one run. Not entirely sure what happened.
==26802== Memcheck, a memory error detector
==26802== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26802== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==26802== Command: /home/travis/build/flux-framework/flux-core/flux-core-dfff6c3/_build/sub/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /home/travis/build/flux-framework/flux-core/flux-core-dfff6c3/t/valgrind/valgrind-workload.sh 10
==26802==
2017-08-18T23:13:53.073763Z broker.err[0]: rc1: flux-module: flux_open: No such file or directory
2017-08-18T23:13:53.119203Z broker.err[0]: Run level 1 Exited with non-zero status (rc=1) 0.9s
2017-08-18T23:13:53.384930Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] userdb: No such file or directory
2017-08-18T23:13:53.485824Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] cron: No such file or directory
2017-08-18T23:13:53.559402Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] job: No such file or directory
2017-08-18T23:13:53.670238Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] resource-hwloc: No such file or directory
2017-08-18T23:13:53.739372Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] aggregator: No such file or directory
2017-08-18T23:13:53.822058Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] kvs: No such file or directory
2017-08-18T23:13:53.899850Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] barrier: No such file or directory
2017-08-18T23:13:53.985997Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] content-sqlite: No such file or directory
==26802==
==26802== HEAP SUMMARY:
==26802== in use at exit: 81,357 bytes in 61 blocks
==26802== total heap usage: 7,400 allocs, 7,339 frees, 2,239,782 bytes allocated
==26802==
==26802== LEAK SUMMARY:
==26802== definitely lost: 0 bytes in 0 blocks
==26802== indirectly lost: 0 bytes in 0 blocks
==26802== possibly lost: 0 bytes in 0 blocks
==26802== still reachable: 81,357 bytes in 61 blocks
==26802== suppressed: 0 bytes in 0 blocks
==26802== Reachable blocks (those to which a pointer was found) are not shown.
==26802== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==26802==
==26802== For counts of detected and suppressed errors, rerun with: -v
==26802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
not ok 1 - valgrind reports no new errors on single broker run
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1154 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKX2z8hJevrkjUzTt7P7NUTWarVWV3Eks5sZiMBgaJpZM4O8GXA>
.
|
src/common/libkvs/treeobj.c
Outdated
goto inval; | ||
} | ||
else if (!strcmp (type, "val")) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there can be a json_is_string()
check here? B/c we are base64 encoding, the data should always be a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reference to "test_validate" in the commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crud, I meant treeobj_validate
. That commit message will be eliminated when I squash later on.
src/common/libkvs/treeobj.c
Outdated
char *data; | ||
|
||
if (json_unpack (obj, "{s:i s:s s:s}", "ver", &version, | ||
"type", &type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use treeobj_unpack()
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
src/common/libkvs/treeobj.c
Outdated
count = json_object_size (data); | ||
} | ||
else if (!strcmp (type, "symlink") || !strcmp (type, "value")) { | ||
count = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "val"
, not "value"
. A unit test should have caught this, so perhaps a unit test is missing.
src/common/libkvs/treeobj.h
Outdated
/* get type-specific count. | ||
* For dirref/valref, this is the number of blobrefs. | ||
* For directory, this is number of entries | ||
* For symlink or value, this is 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val
instead of value
for consistency.
src/common/libkvs/kvs_txn.c
Outdated
return 0; | ||
error: | ||
json_decref (dirent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to do saved_errno
here?
src/common/libkvs/kvs_txn.c
Outdated
goto done; | ||
rc = 0; | ||
done: | ||
json_decref (dirent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to do saved_errno
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to have a dedicated error
label that explicitly does a
error:
saved_errno = errno;
// ...cleanup...
errno = saved_errno
return -1
The downside is that cleanup has to be duplicated in both success and failure exit paths. However the other way seems a little bit brittle to somebody forgetting to set saved_errno
in an exit path.
src/common/libkvs/kvs_txn.c
Outdated
goto done; | ||
rc = 0; | ||
done: | ||
json_decref (dirent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to do saved_errno
here?
src/common/libkvs/kvs_txn.c
Outdated
goto done; | ||
rc = 0; | ||
done: | ||
json_decref (dirent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to do saved_errno
here?
src/common/libkvs/kvs_lookup.c
Outdated
if (flux_future_aux_set (f, auxkey, ctx, (flux_free_f)free_ctx) < 0) { | ||
free_ctx (ctx); | ||
flux_future_destroy (f); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a return NULL here?
src/common/libkvs/kvs_lookup.c
Outdated
} | ||
if (flux_future_aux_set (f, auxkey, ctx, (flux_free_f)free_ctx) < 0) { | ||
free_ctx (ctx); | ||
flux_future_destroy (f); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a return NULL here?
Couple general responses to your review comments above, in addition to the ones I made inline:
If you already have these changes made please go ahead and tack them on; o/w let me know and I can do them Apologies for taking longer than I thought to get back on this. |
@garlick I can't remember if we came to a conclusion if |
@garlick Oh, for what its worth you decided to do it here. So we should probably do
|
works for me. |
Seem to have a low coverage diff. There's a few bad-input and missed code paths that I think we can improve on. Can add some unit tests. Noticed |
Yeah that sounds like a good idea to me, |
code cov diff is now 76.61%, with a target of 77.49%. Once rebase + squash is done and patch for |
Yeah, nice work. Sometimes it is difficult to get patch coverage up dependent on what kind of error handling you have. Admirable job with the coverage here! |
Yep, still doing some review/poking but I've not found anything worthy of a comment yet. I'll try to wrap that up this morning so we can move forward. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great on the whole. I've made a few minor comments inline.
src/common/libkvs/test/kvs_txn.c
Outdated
errno = 0; | ||
rc = flux_kvs_txn_pack (txn, 0xFFFF, "foo.bar.blorp", "s", "foo"); | ||
ok (rc < 0 && errno == EINVAL, | ||
"3: flux_kvs_txn_pack(bad flags) fails with EINVAL"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good addition, however I was misguidedly numbering the operations that are added to the transaction and referencing them later in tests, sort of a way to x-reference. This expected failure doesn't add an operation, so it shouldn't get a number. But maybe it would be a good idea to change the numbers to some unordered mnemonic lie "op-unlink"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh, i thought you were just plainly numbering from 1-9. So yeah, I'll adjust accordingly.
src/common/libkvs/kvs_txn.c
Outdated
goto done; | ||
rc = 0; | ||
done: | ||
json_decref (dirent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to have a dedicated error
label that explicitly does a
error:
saved_errno = errno;
// ...cleanup...
errno = saved_errno
return -1
The downside is that cleanup has to be duplicated in both success and failure exit paths. However the other way seems a little bit brittle to somebody forgetting to set saved_errno
in an exit path.
src/common/libkvs/treeobj.c
Outdated
goto inval; | ||
} | ||
else if (!strcmp (type, "val")) { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reference to "test_validate" in the commit message?
src/modules/kvs/kvs.c
Outdated
if (j_dirent_validate (root_dirent) < 0 | ||
|| !(root_ref = json_object_get (root_dirent, "DIRREF"))) { | ||
if (treeobj_validate (root_dirent) < 0 | ||
|| treeobj_is_dirref (root_dirent) < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
treeobj_is_dirref() returns a boolean not -1 or 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed in a prior commit, wondering why you didn't see it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was reviewing each commit and going back and forth to the final source. I guess I missed that this one was fix later - sorry about that.
@@ -42,7 +42,7 @@ | |||
#include "src/common/libutil/monotime.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra to/for in commit message.
Also maybe "main source module" would be better than "kvs"?
(sorry for placement of this comment - couldn't comment on the commit title directly)
@@ -2,29 +2,185 @@ | |||
#include "config.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
improve commit message, like
test/kvs_dir: increase test coverage
the words in the body don't add anything, just drop.
@@ -38,7 +38,7 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps say "internal lookup API" since there is an external one too
src/modules/kvs/lookup.c
Outdated
@@ -606,16 +629,30 @@ bool lookup (lookup_t *lh) | |||
lh->errnum = ENOTDIR; | |||
goto done; | |||
} | |||
reftmp = json_string_value (vp); | |||
if (!(reftmp = treeobj_get_blobref (lh->wdirent, 0))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it useful to check that the number of blobrefs is always one? Might be handy later when we make changes to allow more than one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think that's good check to add.
@@ -348,6 +348,88 @@ void test_dir (void) | |||
json_decref (dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe replace comment about json_copy()
with
json_copy(), which makes a shallow copy of a JSON object, cannot be used directly on a RFC 11 dir object, since the directory itself is one level down under the "data" key.
@@ -37,7 +37,7 @@ | |||
#include <jansson.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(commit messge): how about
update internal commit API for RFC 11
just pushed a branch with a lot of the fixes listed above. Didn't fix the commit messages, as I'll do that when I squash all of these minor fixes into the earlier commits. |
Add new unit tests, detecting if a valref points to an invalid type, dirref points to an invalid type, caller passes in illegal root reference, or if dirref/valref have multiple references.
Add treeobj_copy_dir() to perform a shallow copy of a treeobj dir. json_copy() is not safe for copying. Add unit tests appropriately.
Refactor the logic of store_cache(), so that it no longer steals references, but lets the caller keep ownership. Simplifies logic in several places and makes code easier to understand.
Add test for invalid treeobj object as root and dirrefs that have multiple references.
Remove unnecessary code, in particular with error type tests.
In original test of setting the string "null" within the KVS there was an error that worked out by chance but was incorrect. When doing a kvs put with the string "null", such as with: flux kvs put key=null this should create a json null object. On the other hand: flux kvs put key=\"null\" should create a string "null". Correct for this and create tests for both scenarios.
With change to treeobj implementation, readlink code should be adjusted, calling flux_kvs_lookup_get() instead of flux_kvs_lookup_get_unpack().
Update t1002-kvs-extra.t tests for RFC 11, notably adjusting former dirent types (i.e. FILEVAL) for treeobj types (i.e. "val").
Add additional bad treeobj test cases.
Remove jansson_dirent internal library, as it is no longer used. Remove lingering includes of jansson_dirent.h.
rebased |
hit write error and #731 , restarting some builds |
There are some calls to |
@grondo The flux handle isn't available in the internal kvs lookup/commit API, which is why I used the |
I didn't catch that but we almost certainly want to use flux_log here. Can
we pass in the handle some way?
…On Aug 25, 2017 1:46 PM, "Al Chu" ***@***.***> wrote:
@grondo <https://github.com/grondo> The flux handle isn't available in
the internal kvs lookup/commit API, which is why I used the log_msg()
function.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1154 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKX29QFkbEo10-6UiNHAmQxrY1B9yqRks5sbzK0gaJpZM4O8GXA>
.
|
@garlick We could just pass it in as a parameter to any number of the internal lookup/commit API functions. But not being dependent on the flux handle was part of the initial refactoring goal for these APIs, so that the APIs were independently unit testable. We could pass in the handle optionally, and only log if the handle is available? But I think that's sort of ugly. I'd rather just remove the |
I think the logging, especially of unexpected situations, is pretty valuable - but so is the unit testability. Could we fix by adding a log wrapper of some sort? I would also be OK with leaving this as is and opening an issue to come back to it later. |
Ok, I didn't see any other issues with the PR and tested it on a few different machines (at least with make check). If you guys are ok with pushing logging changes off I'll merge this one. |
I'm good. |
Great! Thanks! |
This is a series of patches by myself and @garlick to integrate Tree Object Format specified in RFC 11 (https://github.com/flux-framework/rfc/blob/master/spec_11.adoc) in the KVS.
This PR is not a full implementation of RFC11, but is a simplified one, as to make the initial implementation in flux-core a bit easier. It doesn't include the
hdir
type and all arrays of blobrefs are implemented with a maximum length of 1. To some extent, it's like the RFC11 equivalent of the older implementation.A lot of code here. After this weekend I want to review this all again with fresh eyes. But we'll start with this and hopefully this it's pretty close to what we want.