Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: rework internal metadata format to conform to RFC 11 design #1154

Merged
merged 27 commits into from
Aug 25, 2017

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Aug 18, 2017

This is a series of patches by myself and @garlick to integrate Tree Object Format specified in RFC 11 (https://github.com/flux-framework/rfc/blob/master/spec_11.adoc) in the KVS.

This PR is not a full implementation of RFC11, but is a simplified one, as to make the initial implementation in flux-core a bit easier. It doesn't include the hdir type and all arrays of blobrefs are implemented with a maximum length of 1. To some extent, it's like the RFC11 equivalent of the older implementation.

A lot of code here. After this weekend I want to review this all again with fresh eyes. But we'll start with this and hopefully this it's pretty close to what we want.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.08%) to 77.927% when pulling 56d8166 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@codecov-io
Copy link

codecov-io commented Aug 18, 2017

Codecov Report

Merging #1154 into master will increase coverage by 0.16%.
The diff coverage is 78.7%.

@@            Coverage Diff             @@
##           master    #1154      +/-   ##
==========================================
+ Coverage   77.56%   77.73%   +0.16%     
==========================================
  Files         158      158              
  Lines       28758    29074     +316     
==========================================
+ Hits        22307    22601     +294     
- Misses       6451     6473      +22
Impacted Files Coverage Δ
src/common/libkvs/kvs_commit.c 86.66% <ø> (ø) ⬆️
src/common/libkvs/kvs_dir.c 92.5% <100%> (+12.77%) ⬆️
src/cmd/flux-kvs.c 76.62% <100%> (ø) ⬆️
src/bindings/lua/flux-lua.c 81.67% <100%> (-0.1%) ⬇️
src/common/libkvs/kvs_lookup.c 68.04% <62.16%> (-16.17%) ⬇️
src/modules/kvs/commit.c 77.77% <73.01%> (+1%) ⬆️
src/modules/kvs/lookup.c 84.69% <74.19%> (-2.4%) ⬇️
src/modules/kvs/kvs.c 63.5% <75%> (+0.3%) ⬆️
src/common/libkvs/kvs_watch.c 87.22% <81.81%> (-0.22%) ⬇️
src/common/libkvs/kvs_txn.c 77.12% <82.43%> (+19.15%) ⬆️
... and 18 more

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 77.955% when pulling 807dacc on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 78.031% when pulling 175ba7a on chu11:rfc11-final into 0c63795 on flux-framework:master.

@chu11
Copy link
Member Author

chu11 commented Aug 18, 2017

Oh yeah, it's probably worth mentioning, soak hasn't been run yet and should.

@chu11
Copy link
Member Author

chu11 commented Aug 18, 2017

Got a valgrind error on one run. Not entirely sure what happened.

==26802== Memcheck, a memory error detector
==26802== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==26802== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==26802== Command: /home/travis/build/flux-framework/flux-core/flux-core-dfff6c3/_build/sub/src/broker/.libs/lt-flux-broker --shutdown-grace=4 /home/travis/build/flux-framework/flux-core/flux-core-dfff6c3/t/valgrind/valgrind-workload.sh 10
==26802== 
2017-08-18T23:13:53.073763Z broker.err[0]: rc1: flux-module: flux_open: No such file or directory
2017-08-18T23:13:53.119203Z broker.err[0]: Run level 1 Exited with non-zero status (rc=1) 0.9s
2017-08-18T23:13:53.384930Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] userdb: No such file or directory
2017-08-18T23:13:53.485824Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] cron: No such file or directory
2017-08-18T23:13:53.559402Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] job: No such file or directory
2017-08-18T23:13:53.670238Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] resource-hwloc: No such file or directory
2017-08-18T23:13:53.739372Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] aggregator: No such file or directory
2017-08-18T23:13:53.822058Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] kvs: No such file or directory
2017-08-18T23:13:53.899850Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] barrier: No such file or directory
2017-08-18T23:13:53.985997Z broker.err[0]: rc3: flux-module: cmb.rmmod[0] content-sqlite: No such file or directory
==26802== 
==26802== HEAP SUMMARY:
==26802==     in use at exit: 81,357 bytes in 61 blocks
==26802==   total heap usage: 7,400 allocs, 7,339 frees, 2,239,782 bytes allocated
==26802== 
==26802== LEAK SUMMARY:
==26802==    definitely lost: 0 bytes in 0 blocks
==26802==    indirectly lost: 0 bytes in 0 blocks
==26802==      possibly lost: 0 bytes in 0 blocks
==26802==    still reachable: 81,357 bytes in 61 blocks
==26802==         suppressed: 0 bytes in 0 blocks
==26802== Reachable blocks (those to which a pointer was found) are not shown.
==26802== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==26802== 
==26802== For counts of detected and suppressed errors, rerun with: -v
==26802== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
not ok 1 - valgrind reports no new errors on single broker run

@garlick
Copy link
Member

garlick commented Aug 19, 2017 via email

goto inval;
}
else if (!strcmp (type, "val")) {
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that there can be a json_is_string() check here? B/c we are base64 encoding, the data should always be a string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reference to "test_validate" in the commit message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crud, I meant treeobj_validate. That commit message will be eliminated when I squash later on.

char *data;

if (json_unpack (obj, "{s:i s:s s:s}", "ver", &version,
"type", &type,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use treeobj_unpack() here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.

count = json_object_size (data);
}
else if (!strcmp (type, "symlink") || !strcmp (type, "value")) {
count = 1;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "val", not "value". A unit test should have caught this, so perhaps a unit test is missing.

/* get type-specific count.
* For dirref/valref, this is the number of blobrefs.
* For directory, this is number of entries
* For symlink or value, this is 1.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val instead of value for consistency.

return 0;
error:
json_decref (dirent);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to do saved_errno here?

goto done;
rc = 0;
done:
json_decref (dirent);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to do saved_errno here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to have a dedicated error label that explicitly does a

error:
    saved_errno = errno;
    // ...cleanup...
    errno = saved_errno
    return -1

The downside is that cleanup has to be duplicated in both success and failure exit paths. However the other way seems a little bit brittle to somebody forgetting to set saved_errno in an exit path.

goto done;
rc = 0;
done:
json_decref (dirent);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to do saved_errno here?

goto done;
rc = 0;
done:
json_decref (dirent);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to do saved_errno here?

if (flux_future_aux_set (f, auxkey, ctx, (flux_free_f)free_ctx) < 0) {
free_ctx (ctx);
flux_future_destroy (f);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a return NULL here?

}
if (flux_future_aux_set (f, auxkey, ctx, (flux_free_f)free_ctx) < 0) {
free_ctx (ctx);
flux_future_destroy (f);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a return NULL here?

@garlick
Copy link
Member

garlick commented Aug 21, 2017

Couple general responses to your review comments above, in addition to the ones I made inline:

  • On saved_errno, I thought we determined free() and json_decref() didn't set errno, but it won't hurt to add that in
  • Missing return in aux_set error paths - grievous error! Good catch!
  • use of val vs value - yes cleaning that up is good.

If you already have these changes made please go ahead and tack them on; o/w let me know and I can do them

Apologies for taking longer than I thought to get back on this.

@chu11
Copy link
Member Author

chu11 commented Aug 21, 2017

@garlick I can't remember if we came to a conclusion if json_decref was safe or not, as it may be safe now, but would it in the future? I'll go ahead and do the remaining fix ups.

@chu11
Copy link
Member Author

chu11 commented Aug 21, 2017

@garlick Oh, for what its worth you decided to do it here. So we should probably do saved_errno around json_decref atleast for consistency.

void flux_kvs_txn_destroy (flux_kvs_txn_t *txn)
{
    if (txn) {
        int saved_errno = errno;
        json_decref (txn->ops);
        free (txn);
        errno = saved_errno;
    }
}

@garlick
Copy link
Member

garlick commented Aug 21, 2017

works for me.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.008%) to 77.854% when pulling 8259d70 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 77.88% when pulling a3cacf1 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@chu11
Copy link
Member Author

chu11 commented Aug 22, 2017

Seem to have a low coverage diff. There's a few bad-input and missed code paths that I think we can improve on. Can add some unit tests.

Noticed flux_kvs_lookup_get_raw() was completely untested. Was going to leave flux-kvs comand raw input/output options for another PR, as its sort of independent of this PR (see chu11@c5b36e5), so perhaps the flux_kvs_lookup_get_raw() patches should be spliced out and put in there as well for another day?

@garlick
Copy link
Member

garlick commented Aug 22, 2017

Yeah that sounds like a good idea to me,

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) to 77.934% when pulling 7581258 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 78.039% when pulling 229cf18 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 77.952% when pulling 229cf18 on chu11:rfc11-final into 0c63795 on flux-framework:master.

@garlick
Copy link
Member

garlick commented Aug 22, 2017

BTW the valgrind error above is a recurrence of the one @grondo reported in #677 and added some code to address in PR #1148. If this continues to be a problem, we might want to increase the FLUX_LOCAL_CONNECTOR_RETRY_COUNT in the valgrind test script a bit more.

@chu11
Copy link
Member Author

chu11 commented Aug 22, 2017

code cov diff is now 76.61%, with a target of 77.49%. Once rebase + squash is done and patch for flux_kvs_lookup_get_raw() is spliced out we should be well above target (or close enough we don't care). So I think we're good here now.

@grondo
Copy link
Contributor

grondo commented Aug 22, 2017

Yeah, nice work. Sometimes it is difficult to get patch coverage up dependent on what kind of error handling you have. Admirable job with the coverage here!

@garlick
Copy link
Member

garlick commented Aug 22, 2017

Yep, still doing some review/poking but I've not found anything worthy of a comment yet. I'll try to wrap that up this morning so we can move forward.

@garlick garlick mentioned this pull request Aug 22, 2017
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great on the whole. I've made a few minor comments inline.

errno = 0;
rc = flux_kvs_txn_pack (txn, 0xFFFF, "foo.bar.blorp", "s", "foo");
ok (rc < 0 && errno == EINVAL,
"3: flux_kvs_txn_pack(bad flags) fails with EINVAL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition, however I was misguidedly numbering the operations that are added to the transaction and referencing them later in tests, sort of a way to x-reference. This expected failure doesn't add an operation, so it shouldn't get a number. But maybe it would be a good idea to change the numbers to some unordered mnemonic lie "op-unlink"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh, i thought you were just plainly numbering from 1-9. So yeah, I'll adjust accordingly.

goto done;
rc = 0;
done:
json_decref (dirent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to have a dedicated error label that explicitly does a

error:
    saved_errno = errno;
    // ...cleanup...
    errno = saved_errno
    return -1

The downside is that cleanup has to be duplicated in both success and failure exit paths. However the other way seems a little bit brittle to somebody forgetting to set saved_errno in an exit path.

goto inval;
}
else if (!strcmp (type, "val")) {
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the reference to "test_validate" in the commit message?

if (j_dirent_validate (root_dirent) < 0
|| !(root_ref = json_object_get (root_dirent, "DIRREF"))) {
if (treeobj_validate (root_dirent) < 0
|| treeobj_is_dirref (root_dirent) < 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treeobj_is_dirref() returns a boolean not -1 or 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed in a prior commit, wondering why you didn't see it?

175ba7a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing each commit and going back and forth to the final source. I guess I missed that this one was fix later - sorry about that.

@@ -42,7 +42,7 @@
#include "src/common/libutil/monotime.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra to/for in commit message.
Also maybe "main source module" would be better than "kvs"?

(sorry for placement of this comment - couldn't comment on the commit title directly)

@@ -2,29 +2,185 @@
#include "config.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

improve commit message, like

test/kvs_dir: increase test coverage

the words in the body don't add anything, just drop.

@@ -38,7 +38,7 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps say "internal lookup API" since there is an external one too

@@ -606,16 +629,30 @@ bool lookup (lookup_t *lh)
lh->errnum = ENOTDIR;
goto done;
}
reftmp = json_string_value (vp);
if (!(reftmp = treeobj_get_blobref (lh->wdirent, 0))) {
Copy link
Member

@garlick garlick Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it useful to check that the number of blobrefs is always one? Might be handy later when we make changes to allow more than one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think that's good check to add.

@@ -348,6 +348,88 @@ void test_dir (void)
json_decref (dir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe replace comment about json_copy() with

json_copy(), which makes a shallow copy of a JSON object, cannot be used directly on a RFC 11 dir object, since the directory itself is one level down under the "data" key.

@@ -37,7 +37,7 @@
#include <jansson.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(commit messge): how about

update internal commit API for RFC 11

@chu11
Copy link
Member Author

chu11 commented Aug 23, 2017

just pushed a branch with a lot of the fixes listed above. Didn't fix the commit messages, as I'll do that when I squash all of these minor fixes into the earlier commits.

chu11 added 13 commits August 25, 2017 11:22
Add new unit tests, detecting if a valref points to an invalid
type, dirref points to an invalid type, caller passes in
illegal root reference, or if dirref/valref have multiple references.
Add treeobj_copy_dir() to perform a shallow copy of a treeobj
dir.  json_copy() is not safe for copying.

Add unit tests appropriately.
Refactor the logic of store_cache(), so that it no longer steals
references, but lets the caller keep ownership.  Simplifies logic
in several places and makes code easier to understand.
Add test for invalid treeobj object as root and dirrefs that
have multiple references.
Remove unnecessary code, in particular with error type tests.
In original test of setting the string "null" within the KVS there
was an error that worked out by chance but was incorrect.

When doing a kvs put with the string "null", such as with:

flux kvs put key=null

this should create a json null object.  On the other hand:

flux kvs put key=\"null\"

should create a string "null".

Correct for this and create tests for both scenarios.
With change to treeobj implementation, readlink code should be
adjusted, calling flux_kvs_lookup_get() instead of
flux_kvs_lookup_get_unpack().
Update t1002-kvs-extra.t tests for RFC 11, notably adjusting
former dirent types (i.e. FILEVAL) for treeobj types (i.e. "val").
Add additional bad treeobj test cases.
Remove jansson_dirent internal library, as it is no longer used.

Remove lingering includes of jansson_dirent.h.
@chu11
Copy link
Member Author

chu11 commented Aug 25, 2017

rebased

@chu11
Copy link
Member Author

chu11 commented Aug 25, 2017

hit write error and #731 , restarting some builds

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 78.079% when pulling de398d8 on chu11:rfc11-final into 53e3f12 on flux-framework:master.

@grondo
Copy link
Contributor

grondo commented Aug 25, 2017

There are some calls to log_msg() added in this PR. Just double checking that the kvs module wants to log to stderr and not to the flux broker ring buffer?

@chu11
Copy link
Member Author

chu11 commented Aug 25, 2017

@grondo The flux handle isn't available in the internal kvs lookup/commit API, which is why I used the log_msg() function.

@garlick
Copy link
Member

garlick commented Aug 25, 2017 via email

@chu11
Copy link
Member Author

chu11 commented Aug 25, 2017

@garlick We could just pass it in as a parameter to any number of the internal lookup/commit API functions. But not being dependent on the flux handle was part of the initial refactoring goal for these APIs, so that the APIs were independently unit testable.

We could pass in the handle optionally, and only log if the handle is available? But I think that's sort of ugly. I'd rather just remove the log_msg() calls entirely.

@garlick
Copy link
Member

garlick commented Aug 25, 2017

I think the logging, especially of unexpected situations, is pretty valuable - but so is the unit testability. Could we fix by adding a log wrapper of some sort?

I would also be OK with leaving this as is and opening an issue to come back to it later.

@grondo
Copy link
Contributor

grondo commented Aug 25, 2017

Ok, I didn't see any other issues with the PR and tested it on a few different machines (at least with make check). If you guys are ok with pushing logging changes off I'll merge this one.

@garlick
Copy link
Member

garlick commented Aug 25, 2017

I'm good.

@grondo grondo merged commit 1c99b02 into flux-framework:master Aug 25, 2017
@grondo
Copy link
Contributor

grondo commented Aug 25, 2017

Great! Thanks!

@garlick
Copy link
Member

garlick commented Aug 25, 2017

Nice work @chu11, and thanks @grondo for the feedback as well!

@grondo grondo mentioned this pull request May 10, 2018
@chu11 chu11 deleted the rfc11-final branch June 5, 2021 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants