KVS: Support reading valref objects with multiple blobrefs #1227

chu11 · 2017-10-09T23:38:46Z

To support #1193 and #1202, this PR adds support to read valref treeobjs with multiple blobrefs in it. Note that outside of manual creation of multiple valref objects, there is no present easy mechanism to write multi-blobref valref objects. The "write" half of these features is for later subtasks in #1193 and #1202.

There are two major parts to this PR. One was to refactor the internal KVS lookup API to support multiple missing references being returned to callers. The second major part was to support loading multiple blobrefs in the internal KVS lookup API and creating a single val object to return the value to the caller.

coveralls · 2017-10-09T23:59:27Z

Coverage decreased (-0.06%) to 78.59% when pulling 11dd00e on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

codecov-io · 2017-10-10T00:00:27Z

Codecov Report

Merging #1227 into master will decrease coverage by <.01%.
The diff coverage is 79.69%.

@@            Coverage Diff            @@
##           master   #1227      +/-   ##
=========================================
- Coverage   78.11%   78.1%   -0.01%     
=========================================
  Files         154     154              
  Lines       28702   28812     +110     
=========================================
+ Hits        22420   22504      +84     
- Misses       6282    6308      +26

Impacted Files	Coverage Δ
src/modules/kvs/kvs.c	`63.52% <54.54%> (-0.63%)`	⬇️
src/modules/kvs/lookup.c	`84.18% <88%> (+0.3%)`	⬆️
src/common/libutil/blobref.c	`97.22% <0%> (-1.39%)`	⬇️
src/common/libflux/request.c	`87.17% <0%> (-1.29%)`	⬇️
src/common/libflux/mrpc.c	`85.49% <0%> (-1.18%)`	⬇️
src/common/libkvs/kvs_watch.c	`86.34% <0%> (-0.89%)`	⬇️
src/common/libkvs/kvs.c	`64.87% <0%> (-0.63%)`	⬇️
src/broker/module.c	`83.79% <0%> (-0.28%)`	⬇️
src/common/libflux/message.c	`81.48% <0%> (+0.23%)`	⬆️
src/common/libflux/future.c	`89.25% <0%> (+0.46%)`	⬆️
... and 4 more

chu11 · 2017-10-10T00:01:20Z

struct lookup_ref_data ld = { 0 };

I guess i need to specifically set each field to zero to avoid warning error, added fix, will squash later.

coveralls · 2017-10-10T00:22:13Z

Coverage decreased (-0.05%) to 78.597% when pulling 2532e1b on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

chu11 · 2017-10-10T00:23:57Z

boo, "write error" fails, restarting

garlick · 2017-10-10T00:47:28Z

Great!

I'll try running and poking at this tomorrow. Couple quick comments: function pointers should end in _f (per RFC 7) and the word "appropriate" is comically overused in the commit message for 08910cf :-)

I was vaguely wondering if the creation of a single val object from multiple blobs was something that could be cleverly abstracted in treeobj.c. I'm not sure what interface would work best here though, may not be worth it.

chu11 · 2017-10-10T14:38:07Z

just pushed bunch of tiny things, minor bug fixes, bunch of extra coverage tests, and renaming the callback function to have _f instead of _cb. Hopefully can get coverage to around 78%, but may be hard given many "impossible" to reach error paths.

coveralls · 2017-10-10T15:01:04Z

Coverage decreased (-0.006%) to 78.64% when pulling a1fbc51 on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

chu11 · 2017-10-10T15:13:08Z

hmmm, on two builds my EOVERFLOW test fails as it returns ENOMEM. My assumption is this check:

        total += len;
        if (total < len) {
            lh->errnum = EOVERFLOW;
            goto done;
        }

is not working and a malloc below it is subsequently failing. Looking online it appears this is not safe behavior for signed ints. Will have to research.

coveralls · 2017-10-10T16:01:11Z

Coverage decreased (-0.06%) to 78.59% when pulling 019b7f4 on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

coveralls · 2017-10-10T16:20:42Z

Coverage decreased (-0.06%) to 78.583% when pulling 14c1b51 on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

chu11 · 2017-10-10T17:03:01Z

more tiny fixes pushed to increase coverage, fix a minor issue. It's a ton of tiny things. If you haven't started reviewing yet, I can squash and get things back to a nice point.

coveralls · 2017-10-10T17:22:15Z

Coverage increased (+0.03%) to 78.671% when pulling e42c744 on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

garlick · 2017-10-10T17:42:22Z

I was holding off until you settled down :-) Squashing would be great.

chu11 · 2017-10-10T18:23:06Z

ok, just squashed. Last go-around I was at 76.8% diff coverage. Lets see if my last tweak can get me past 78%. If it doesn't, this might be as close as I can get. There's a lot of "impossible" paths in the main kvs.c file that can't be reached.

coveralls · 2017-10-10T18:43:19Z

Coverage increased (+0.01%) to 78.655% when pulling 85732bf on chu11:issue1202-part1 into 9d3b4b1 on flux-framework:master.

garlick · 2017-10-10T20:17:48Z

I tried modifying your sharness test to make the first blob empty and got a hang and:

content_load_completion: cache_entry_set_raw: Invalid argument

Since we allow an empty raw value to be stored in the KVS, presumably we should allow it to be appended to?

garlick

I added some inline comments which I hope aren't too off base, but please feel free to tell me if so.

I would like to see more sharness tests, variants on the one you already added, that include things like zero length blobs in different positions, and blobrefs that aren't in the store to make sure those corner cases are handled right.

Also, I think we spoke about this before but I am not sure how we resolved it - EPERM seems like the wrong error code to return for things like dangling blobrefs. Do you remember where we left that?

garlick · 2017-10-10T20:32:54Z

src/modules/kvs/lookup.c

+
+/* return 0 on success, -1 on failure.  On success, stall should be
+ * check */
+static int get_multi_blobref_valref_value (lookup_t *lh, int refcount,


Just a suggestion and may just be a style thing: would it be clearer if determining the aggregate size and copying to the aggregate buffer were split into two functions?

yeah, that could probably help.

garlick · 2017-10-10T20:34:41Z

src/modules/kvs/kvs.c

@@ -804,6 +812,13 @@ static void get_request_cb (flux_t *h, flux_msg_handler_t *w,

        if (lookup_iter_missing_refs (lh, lookup_load_cb, &cbd) < 0) {
            errno = cbd.errnum;
+


sharness test for this? (one bad blobref in valref array?)

garlick · 2017-10-10T20:36:32Z

src/modules/kvs/lookup.h

@@ -36,6 +36,12 @@ bool lookup_validate (lookup_t *lh);
 * an error occurred or not */
 int lookup_get_errnum (lookup_t *lh);

+/* if user wishes to stall, but needs future knowledge to fail and


is that a real use case? is there ever a reason to stall the request if you've hit an error? Seems like you would want to immediately abort and get the response to the user.

So the way I've handled it in the past (with the commit API) was that if multiple RPCs are sent successfully then one fails, I wait for the in-flight rpcs to complete then return the error to the user. So these get/set aux errnums are the "flag" for the callback function to return an error to the user. See commit_apply() function in kvs.c and you'll see what I do.

This perhaps could be dealt with an alternate way. Perhaps an error could be returned to the user immediately and get/set aux errnum could be a flag informing the callback function "this rpc already errored, don't send user a response". It'd simply be a logic change. Perhaps for another issue though?

Ah, sorry, I think that's fine for now. It's an error case, so trading a little latency for simplicity is probably a wise move. If you feel that change could actually simplify the code here and in the commit API, then I'd suggest opening a bug for later.

I think code wise it's not too much different each way. Of course returning the error to the user earlier shortens the latency a bit. I'll create an issue so we don't forget about this.

Actually, thinking about it now, the code logic would probably be more confusing. Because I'd have to keep some auxiliary-data structures around longer than they are currently used.

For example in commit_apply(), after we reply to the user we call commit_mgr_remove_commit() to remove/destroy a commit context. If I were to change the logic to return an error to the user immediately, I can't destroy this. It still needs to exist for the rpcs that haven't finished.

Think it'd be better to keep it the way it is right now.

garlick · 2017-10-10T20:41:23Z

src/modules/kvs/lookup.c

@@ -57,6 +57,11 @@ typedef struct {
    zlist_t *pathcomps;
 } walk_level_t;

+typedef struct {


I am probably not understanding the big picture, but why do we need the zlist of these structs when we have the json_t ref itself, which contains a JSON array of hash keys into the cache?

I think you're right, we don't. I was sort of mimicking what was done in the internal commit API.

doh! changing up the code right now, I realize there was a reason. We can't assume every blobref in the valref treeobj is missing from the cache. Some could be missing while others aren't.

It perhaps wouldn't have much performance impact to pass all references back to the caller, b/c the main kvs module would already recognize that some of those missing blobrefs are already in the cache. It'd simply be an API style change to say I'm passing back a list of references, atleast 1 of which is missing. Instead of all of them are missing.

I'll have to ponder this. As the change from the list to just using the valref object is much cleaner.

duh, thought of an obvious way to fix

chu11 · 2017-10-10T22:08:09Z

hey jim, you're right. I had not thought of the 0 length blobs, so some tests should definitely be added and code fixed accordingly.

chu11 · 2017-10-10T22:11:33Z

As for EPERM, I think we just sort of left it the last time it was discussed, as EPERM is the catch all "internally not consistent/bad" kind of errno. I'll create an issue for this so we don't forget.

garlick · 2017-10-12T16:37:38Z

@chu11 - are you waiting on me for anything here? Your comments above all make sense to me. Let me know when you're ready for me to make another review pass.

chu11 · 2017-10-12T17:48:19Z

I'm working on #1232 right now. It sort of needs to be done first, b/c without that I can't really do the multi-blob support w/ empty data.

chu11 · 2017-10-17T00:39:52Z

re-pushed rebased with master and changes from #1232.

lookup no longer uses a list for returning missing refs to the user, just uses the valref object itself, this is way cleaner now. Thanks Jim.

some code cleanup in lookup API

some multi-blobref tests added that have zero-content length blobs

need to add bad-blobref amongst the good blob-ref tests, forgot about that one. Forth coming soon.

chu11 · 2017-10-17T00:51:27Z

oh yeah, I forgot, the test for 1 illegal blobref in a valref array hangs b/c of #792. So that test will probably have to be for another day. I noted in #792 that this test should be added.

But we can add a blobref that points to the wrong type of data (i.e. a directory or something). Will add that test.

coveralls · 2017-10-17T01:00:09Z

Coverage increased (+0.005%) to 78.689% when pulling 1df93cd on chu11:issue1202-part1 into 9f03dda on flux-framework:master.

chu11 · 2017-10-17T01:33:01Z

re-pushed, added a valref w/ a single-blobref test in which the blobref points to the wrong type of data.

then added a multi-blobref valref test, in which one of the blobrefs points to the wrong type of data.

But when I wrote the tests, I realized a problem. Imagine ...

a directory has blobref to another dir, say that blob is sha1-XXX
valref has sha1-XXX as a reference to a raw piece of data, which is invalid/bad in this case

How sha1-XXX is stored in the KVS cache (stored as json or raw) will depend on how the reference is first loaded into the KVS cache. If the valref is loaded first, it'll be loaded as raw data. If it's loaded as a directory object first, it'll be loaded as json. This is b/c the data in the content store is not "typed" in any way. An error would occur on whatever tries to read sha1-XXX second.

Not sure if this needs to be solved in some way. The only way this occurs is if the user manually creates tree objects and modifies them with bad data.

coveralls · 2017-10-17T01:40:08Z

Coverage decreased (-0.02%) to 78.667% when pulling edfcac2 on chu11:issue1202-part1 into 9f03dda on flux-framework:master.

chu11 · 2017-10-17T15:11:35Z

hmmm, lots of failures. It appears with my new test in 116f6b8. Tests after it don't run, suggesting segfault or assert. Hmmm.

chu11 · 2017-10-17T15:57:31Z

ugh, dumb re base error, re-pushed (per twitter - github + travis is currently having a problem, so i guess CI will run whenever that is fixed)

coveralls · 2017-10-17T16:38:38Z

Coverage increased (+0.06%) to 78.739% when pulling f2e405e on chu11:issue1202-part1 into 9f03dda on flux-framework:master.

chu11 · 2017-10-17T17:10:04Z

re-push, fixing chain-lint

chu11 · 2017-10-17T18:12:32Z

hit write errors and #731, restarting builds

coveralls · 2017-10-17T18:24:30Z

Coverage decreased (-0.009%) to 78.675% when pulling b66ad57 on chu11:issue1202-part1 into 9f03dda on flux-framework:master.

Refactor lookup_get_missing_ref() into lookup_iter_missing_refs(), in which user passes a callback function to retrieve the missing reference and raw boolean instead of via the function itself. This is to prepare for the KVS lookup API returning multiple missing references back to the user. Update code appropriately in main KVS module. Update unit tests and add additional tests as necessary.

Refactor internal lookup API to be able to handle returning multiple missing references from a valref to the caller. This is predominantly infrastructure support for future multiple missing references support and nothing in the internal KVS lookup API currently uses this. As a fallout of this refactoring, the `missing_ref_raw` flag is no longer necessary and has been removed.

Add lookup_get_aux_errnum() and lookup_set_aux_errnum() in internal KVS lookup API. This is convenience for future error handling needs. Add unit tests appropriately.

In main KVS module, handle errors on multiple reference lookup loads if an error occurs after several rpcs have already been sent.

In internal lookup API, return val appropriately if valref has multiple blobrefs within it. This is done by loading each blobref appropriately and constructing a concatenated result. Update unit tests appropriately. Add more tests for coverage.

chu11 · 2017-10-17T18:55:07Z

re-pushed, eek out a couple extra lines of code coverage.

coveralls · 2017-10-17T19:31:50Z

Coverage increased (+0.02%) to 78.701% when pulling a1233f6 on chu11:issue1202-part1 into 9f03dda on flux-framework:master.

garlick · 2017-10-17T21:51:19Z

I'm for merging this if you're ready @chu11.

chu11 · 2017-10-17T21:54:03Z

it's good to go

chu11 requested a review from garlick October 9, 2017 23:38

chu11 force-pushed the issue1202-part1 branch from e42c744 to 85732bf Compare October 10, 2017 18:21

garlick requested changes Oct 10, 2017

View reviewed changes

chu11 mentioned this pull request Oct 11, 2017

kvs: support valref pointing to empty blobs #1232

Closed

chu11 mentioned this pull request Oct 12, 2017

KVS: Misc cleanup, add test coverage, minor bug fix #1235

Merged

chu11 force-pushed the issue1202-part1 branch from 85732bf to 1df93cd Compare October 17, 2017 00:37

chu11 mentioned this pull request Oct 17, 2017

kvs commit handling should propagate content cache errors back to user #792

Closed

t/kvs: Add valref with invalid blobref test

116f6b8

chu11 force-pushed the issue1202-part1 branch from 1df93cd to edfcac2 Compare October 17, 2017 01:19

chu11 mentioned this pull request Oct 17, 2017

kvs: refactor internal kvs cache to solve storage type race #1239

Closed

chu11 force-pushed the issue1202-part1 branch 2 times, most recently from 7102717 to f2e405e Compare October 17, 2017 16:17

chu11 force-pushed the issue1202-part1 branch from f2e405e to b66ad57 Compare October 17, 2017 17:09

chu11 added 6 commits October 17, 2017 11:49

modules/kvs: Add get/set aux_errnum in lookup API

5cd0570

Add lookup_get_aux_errnum() and lookup_set_aux_errnum() in internal KVS lookup API. This is convenience for future error handling needs. Add unit tests appropriately.

modules/kvs: Handle lookup multi-load error

96397af

In main KVS module, handle errors on multiple reference lookup loads if an error occurs after several rpcs have already been sent.

t/kvs: Add multi-blobref read tests

a1233f6

chu11 force-pushed the issue1202-part1 branch from b66ad57 to a1233f6 Compare October 17, 2017 18:54

garlick merged commit eebb5e7 into flux-framework:master Oct 17, 2017

grondo mentioned this pull request May 10, 2018

0.9.0 Release #1479

Closed

chu11 deleted the issue1202-part1 branch June 5, 2021 17:01

		@@ -804,6 +812,13 @@ static void get_request_cb (flux_t h, flux_msg_handler_t w,

		if (lookup_iter_missing_refs (lh, lookup_load_cb, &cbd) < 0) {
		errno = cbd.errnum;

KVS: Support reading valref objects with multiple blobrefs #1227

KVS: Support reading valref objects with multiple blobrefs #1227

Conversation

chu11 commented Oct 9, 2017

coveralls commented Oct 9, 2017

codecov-io commented Oct 10, 2017 • edited Loading

Codecov Report

chu11 commented Oct 10, 2017 • edited Loading

coveralls commented Oct 10, 2017

chu11 commented Oct 10, 2017

garlick commented Oct 10, 2017

chu11 commented Oct 10, 2017 • edited Loading

coveralls commented Oct 10, 2017

chu11 commented Oct 10, 2017

coveralls commented Oct 10, 2017

coveralls commented Oct 10, 2017

chu11 commented Oct 10, 2017

coveralls commented Oct 10, 2017

garlick commented Oct 10, 2017

chu11 commented Oct 10, 2017

coveralls commented Oct 10, 2017

garlick commented Oct 10, 2017

garlick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chu11 commented Oct 10, 2017

chu11 commented Oct 10, 2017

garlick commented Oct 12, 2017

chu11 commented Oct 12, 2017

chu11 commented Oct 17, 2017

chu11 commented Oct 17, 2017 • edited Loading

coveralls commented Oct 17, 2017

chu11 commented Oct 17, 2017

coveralls commented Oct 17, 2017

chu11 commented Oct 17, 2017

chu11 commented Oct 17, 2017 • edited Loading

coveralls commented Oct 17, 2017

chu11 commented Oct 17, 2017

chu11 commented Oct 17, 2017

coveralls commented Oct 17, 2017

chu11 commented Oct 17, 2017

coveralls commented Oct 17, 2017

garlick commented Oct 17, 2017

chu11 commented Oct 17, 2017

codecov-io commented Oct 10, 2017 •

edited

Loading

chu11 commented Oct 10, 2017 •

edited

Loading

chu11 commented Oct 10, 2017 •

edited

Loading

chu11 commented Oct 17, 2017 •

edited

Loading

chu11 commented Oct 17, 2017 •

edited

Loading