-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KVS: Support reading valref objects with multiple blobrefs #1227
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1227 +/- ##
=========================================
- Coverage 78.11% 78.1% -0.01%
=========================================
Files 154 154
Lines 28702 28812 +110
=========================================
+ Hits 22420 22504 +84
- Misses 6282 6308 +26
|
I guess i need to specifically set each field to zero to avoid warning error, added fix, will squash later. |
boo, "write error" fails, restarting |
Great! I'll try running and poking at this tomorrow. Couple quick comments: function pointers should end in _f (per RFC 7) and the word "appropriate" is comically overused in the commit message for 08910cf :-) I was vaguely wondering if the creation of a single val object from multiple blobs was something that could be cleverly abstracted in treeobj.c. I'm not sure what interface would work best here though, may not be worth it. |
just pushed bunch of tiny things, minor bug fixes, bunch of extra coverage tests, and renaming the callback function to have |
hmmm, on two builds my EOVERFLOW test fails as it returns ENOMEM. My assumption is this check:
is not working and a malloc below it is subsequently failing. Looking online it appears this is not safe behavior for signed ints. Will have to research. |
more tiny fixes pushed to increase coverage, fix a minor issue. It's a ton of tiny things. If you haven't started reviewing yet, I can squash and get things back to a nice point. |
I was holding off until you settled down :-) Squashing would be great. |
e42c744
to
85732bf
Compare
ok, just squashed. Last go-around I was at 76.8% diff coverage. Lets see if my last tweak can get me past 78%. If it doesn't, this might be as close as I can get. There's a lot of "impossible" paths in the main |
I tried modifying your sharness test to make the first blob empty and got a hang and:
Since we allow an empty raw value to be stored in the KVS, presumably we should allow it to be appended to? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some inline comments which I hope aren't too off base, but please feel free to tell me if so.
I would like to see more sharness tests, variants on the one you already added, that include things like zero length blobs in different positions, and blobrefs that aren't in the store to make sure those corner cases are handled right.
Also, I think we spoke about this before but I am not sure how we resolved it - EPERM seems like the wrong error code to return for things like dangling blobrefs. Do you remember where we left that?
|
||
/* return 0 on success, -1 on failure. On success, stall should be | ||
* check */ | ||
static int get_multi_blobref_valref_value (lookup_t *lh, int refcount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion and may just be a style thing: would it be clearer if determining the aggregate size and copying to the aggregate buffer were split into two functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that could probably help.
src/modules/kvs/kvs.c
Outdated
@@ -804,6 +812,13 @@ static void get_request_cb (flux_t *h, flux_msg_handler_t *w, | |||
|
|||
if (lookup_iter_missing_refs (lh, lookup_load_cb, &cbd) < 0) { | |||
errno = cbd.errnum; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sharness test for this? (one bad blobref in valref array?)
@@ -36,6 +36,12 @@ bool lookup_validate (lookup_t *lh); | |||
* an error occurred or not */ | |||
int lookup_get_errnum (lookup_t *lh); | |||
|
|||
/* if user wishes to stall, but needs future knowledge to fail and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that a real use case? is there ever a reason to stall the request if you've hit an error? Seems like you would want to immediately abort and get the response to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the way I've handled it in the past (with the commit API) was that if multiple RPCs are sent successfully then one fails, I wait for the in-flight rpcs to complete then return the error to the user. So these get/set aux errnums are the "flag" for the callback function to return an error to the user. See commit_apply()
function in kvs.c and you'll see what I do.
This perhaps could be dealt with an alternate way. Perhaps an error could be returned to the user immediately and get/set aux errnum could be a flag informing the callback function "this rpc already errored, don't send user a response". It'd simply be a logic change. Perhaps for another issue though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry, I think that's fine for now. It's an error case, so trading a little latency for simplicity is probably a wise move. If you feel that change could actually simplify the code here and in the commit API, then I'd suggest opening a bug for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think code wise it's not too much different each way. Of course returning the error to the user earlier shortens the latency a bit. I'll create an issue so we don't forget about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about it now, the code logic would probably be more confusing. Because I'd have to keep some auxiliary-data structures around longer than they are currently used.
For example in commit_apply()
, after we reply to the user we call commit_mgr_remove_commit()
to remove/destroy a commit context. If I were to change the logic to return an error to the user immediately, I can't destroy this. It still needs to exist for the rpcs that haven't finished.
Think it'd be better to keep it the way it is right now.
src/modules/kvs/lookup.c
Outdated
@@ -57,6 +57,11 @@ typedef struct { | |||
zlist_t *pathcomps; | |||
} walk_level_t; | |||
|
|||
typedef struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am probably not understanding the big picture, but why do we need the zlist of these structs when we have the json_t ref itself, which contains a JSON array of hash keys into the cache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right, we don't. I was sort of mimicking what was done in the internal commit API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doh! changing up the code right now, I realize there was a reason. We can't assume every blobref in the valref treeobj is missing from the cache. Some could be missing while others aren't.
It perhaps wouldn't have much performance impact to pass all references back to the caller, b/c the main kvs module would already recognize that some of those missing blobrefs are already in the cache. It'd simply be an API style change to say I'm passing back a list of references, atleast 1 of which is missing. Instead of all of them are missing.
I'll have to ponder this. As the change from the list to just using the valref object is much cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duh, thought of an obvious way to fix
hey jim, you're right. I had not thought of the 0 length blobs, so some tests should definitely be added and code fixed accordingly. |
As for EPERM, I think we just sort of left it the last time it was discussed, as EPERM is the catch all "internally not consistent/bad" kind of errno. I'll create an issue for this so we don't forget. |
@chu11 - are you waiting on me for anything here? Your comments above all make sense to me. Let me know when you're ready for me to make another review pass. |
I'm working on #1232 right now. It sort of needs to be done first, b/c without that I can't really do the multi-blob support w/ empty data. |
85732bf
to
1df93cd
Compare
re-pushed rebased with master and changes from #1232. lookup no longer uses a list for returning missing refs to the user, just uses the valref object itself, this is way cleaner now. Thanks Jim. some code cleanup in lookup API some multi-blobref tests added that have zero-content length blobs need to add bad-blobref amongst the good blob-ref tests, forgot about that one. Forth coming soon. |
oh yeah, I forgot, the test for 1 illegal blobref in a valref array hangs b/c of #792. So that test will probably have to be for another day. I noted in #792 that this test should be added. But we can add a blobref that points to the wrong type of data (i.e. a directory or something). Will add that test. |
1df93cd
to
edfcac2
Compare
re-pushed, added a valref w/ a single-blobref test in which the blobref points to the wrong type of data. then added a multi-blobref valref test, in which one of the blobrefs points to the wrong type of data. But when I wrote the tests, I realized a problem. Imagine ... a directory has blobref to another dir, say that blob is sha1-XXX How sha1-XXX is stored in the KVS cache (stored as json or raw) will depend on how the reference is first loaded into the KVS cache. If the valref is loaded first, it'll be loaded as raw data. If it's loaded as a directory object first, it'll be loaded as json. This is b/c the data in the content store is not "typed" in any way. An error would occur on whatever tries to read Not sure if this needs to be solved in some way. The only way this occurs is if the user manually creates tree objects and modifies them with bad data. |
hmmm, lots of failures. It appears with my new test in 116f6b8. Tests after it don't run, suggesting segfault or assert. Hmmm. |
ugh, dumb re base error, re-pushed (per twitter - github + travis is currently having a problem, so i guess CI will run whenever that is fixed) |
7102717
to
f2e405e
Compare
f2e405e
to
b66ad57
Compare
re-push, fixing chain-lint |
hit write errors and #731, restarting builds |
Refactor lookup_get_missing_ref() into lookup_iter_missing_refs(), in which user passes a callback function to retrieve the missing reference and raw boolean instead of via the function itself. This is to prepare for the KVS lookup API returning multiple missing references back to the user. Update code appropriately in main KVS module. Update unit tests and add additional tests as necessary.
Refactor internal lookup API to be able to handle returning multiple missing references from a valref to the caller. This is predominantly infrastructure support for future multiple missing references support and nothing in the internal KVS lookup API currently uses this. As a fallout of this refactoring, the `missing_ref_raw` flag is no longer necessary and has been removed.
Add lookup_get_aux_errnum() and lookup_set_aux_errnum() in internal KVS lookup API. This is convenience for future error handling needs. Add unit tests appropriately.
In main KVS module, handle errors on multiple reference lookup loads if an error occurs after several rpcs have already been sent.
In internal lookup API, return val appropriately if valref has multiple blobrefs within it. This is done by loading each blobref appropriately and constructing a concatenated result. Update unit tests appropriately. Add more tests for coverage.
b66ad57
to
a1233f6
Compare
re-pushed, eek out a couple extra lines of code coverage. |
I'm for merging this if you're ready @chu11. |
it's good to go |
To support #1193 and #1202, this PR adds support to read valref treeobjs with multiple blobrefs in it. Note that outside of manual creation of multiple valref objects, there is no present easy mechanism to write multi-blobref valref objects. The "write" half of these features is for later subtasks in #1193 and #1202.
There are two major parts to this PR. One was to refactor the internal KVS lookup API to support multiple missing references being returned to callers. The second major part was to support loading multiple blobrefs in the internal KVS lookup API and creating a single val object to return the value to the caller.