Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KVS: Support valref pointing to zero length blob objects #1237

Merged
merged 8 commits into from
Oct 16, 2017

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Oct 13, 2017

This series of patches is to support valref treeobjects pointing to zero-length blobref data.

This seems like a lot of code for a pretty dumb corner case, but it came down to updating the KVS cache API to not assume non-NULL data pointer means valid and NULL data pointer is invalid. A flag was placed into the KVS cache entry to handle this, but it beget a number of API style changes.

For starters, cache_entry_get_raw () had to be modified so that a NULL data buffer returned wasn't automatically assumed to be error. This is an obvious refactoring that was needed. The rest are more subtle.

Under the old API:

cache_entry_create_json (NULL) - creates an empty/invalid cache entry
cache_entry_create_raw (NULL, 0) - creates a valid cache entry???

cache_entry_set_json (hp, NULL) - clears json from the cache entry
cache_entry_set_raw (hp, NULL, 0) - should do ...???

So for consistency, cache_entry_create_json() and cache_entry_set_json() now only take valid input (i.e. non-NULL json objects). The raw equivalents can take NULL & 0, b/c that's just
a zero length blobref.

New functions cache_entry_create_json_empty() and cache_entry_create_raw_empty() were created to create "empty" entries.

In order to clear data from an entry, cache_entry_clear_data() was added. The cache_entry_set_json() and cache_entry_set_raw() functions can no longer clear data from an entry by passing in NULL.

As a consequence, it's worth noting that cache_entry_set_raw() will also call wait_runqueue() when a zero-byte length buffer is set into the entry. This was not done before.

Debate on implementation:

Tiny part of me dislikes the function name "empty". Can discuss alternates. I considered using a flag type variable to say "I have data", but that seemed even uglier and more confusing.

I also considered removing the "empty" functions and supporting a cache_entry_set_type() function, to say these cache entries only support json or raw (i.e. user would call cache_entry_create(), then cache_entry_set_type()). But that's two function calls instead of one. I nixed that idea.

Also, for one error case I choose errno EBADE. Right one??

@codecov-io
Copy link

codecov-io commented Oct 13, 2017

Codecov Report

Merging #1237 into master will increase coverage by 0.02%.
The diff coverage is 89.23%.

@@            Coverage Diff             @@
##           master    #1237      +/-   ##
==========================================
+ Coverage   78.13%   78.15%   +0.02%     
==========================================
  Files         154      154              
  Lines       28680    28702      +22     
==========================================
+ Hits        22408    22431      +23     
+ Misses       6272     6271       -1
Impacted Files Coverage Δ
src/modules/kvs/commit.c 76.68% <100%> (ø) ⬆️
src/modules/kvs/lookup.c 83.88% <100%> (ø) ⬆️
src/modules/kvs/kvs.c 64.4% <64.28%> (+0.66%) ⬆️
src/modules/kvs/cache.c 92.36% <95.91%> (+0.52%) ⬆️
src/modules/connector-local/local.c 77.38% <0%> (-1.31%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/common/libflux/response.c 84.55% <0%> (-0.82%) ⬇️
src/common/libutil/base64.c 95.07% <0%> (-0.71%) ⬇️
src/common/libflux/message.c 81.6% <0%> (+0.11%) ⬆️
... and 4 more

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 78.689% when pulling 954ff9d on chu11:issue1232 into 1893b92 on flux-framework:master.

@garlick
Copy link
Member

garlick commented Oct 13, 2017

I'm going to be travelling tomorrow so I may not get to review this for a bit. One quick question though, do the cache_create() functions really need to be typed? I.e. does an invalid cache entry need a type?

@chu11
Copy link
Member Author

chu11 commented Oct 13, 2017

Are you referring to the fact I have a "NONE" data type? I suppose this is mostly for internal reasons, to signify the user has not yet assigned "JSON" or "RAW" data, so the cache entry doesn't know. I suppose that could be masked/hidden internally.

Thought, is "type" perhaps the wrong word to use?

@chu11
Copy link
Member Author

chu11 commented Oct 13, 2017

It occurred to me last night that instead of creating cache_entry_create_json_empty() and cache_entry_create_raw_empty(), I could have modified cache_entry_create() to always create an "empty" entry, but the user can now pass in a data type to it. So

cache_entry_create(NONE) - means I don't know what this entry will hold
cache_entry_create(JSON) - says I want this cache entry for json, but I don't have data yet
cache_entry_create(RAW) - says I want this cache entry for raw, but I don't have data yet

probably better than adding two new functions.

Well, I'll let the review happen on what's there for now. I can change this later.

Also, "NONE" maybe should be changed to "UNKNOWN" ... TBD

@garlick
Copy link
Member

garlick commented Oct 13, 2017

I was just thinking that one creation function could set the type to none/unknown, and it could get its type assigned when it becomes valid. Maybe that doesn't fit - I didn't look too closely. Sorry, getting packed up today so I'll be offline most of today, but I'll try to check in later.

@chu11
Copy link
Member Author

chu11 commented Oct 13, 2017

I like this idea enough I"ll go ahead and change it.

@chu11
Copy link
Member Author

chu11 commented Oct 13, 2017

As for your question, "it could get its type assigned when it becomes valid". Basically assigning a type (or none) is an option for the caller. In some cases, assigning the type ahead of time is effectively a flag saying, "Hey, this is what I'm expecting". I use this "flag" so that content-loading is aware what to expect.

@chu11
Copy link
Member Author

chu11 commented Oct 13, 2017

Red-did the cache_entry_create() function to take a data type parameter instead and re-pushed entire branch. Definitely feels better this way.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 78.709% when pulling 0797b8b on chu11:issue1232 into 2bddc2e on flux-framework:master.

@chu11
Copy link
Member Author

chu11 commented Oct 13, 2017

Looks like I hit a #1169 and a write error. re-run builds.

@garlick
Copy link
Member

garlick commented Oct 16, 2017

This looks fine, just needs a rebase.

It does seem like the cache interface could be simplified, but maybe not a topic for this PR. I'll open an issue to discuss it.

chu11 added 8 commits October 16, 2017 13:53
cache_entry_create() now takes a cache_data_type_t parameter,
allowing to user to create a cache entry without any data
attached to it, but at the same time indicate the type of data
expected for the cache entry to store.  This is an alternate
to passing NULL to the function cache_entry_create_json ().
Cache entry creation API has slightly different logic now.  Instead
of creating an "empty" entry by setting inputs to NULL
(i.e. cache_entry_create_json(NULL) or cache_entry_create_raw(NULL, 0))
these create functions now expect correct input.

Callers wishing to create empty entries without data shall call
cache_entry_create().

As a side effect, this now means that cache_entry_create_raw() now
is not ambiguous about setting zero length data.  Setting
cache_entry_create_raw(NULL, 0) creates a cache entry with a zero
length of data.  Where as before, it was ambiguous if it was
an empty entry or an entry with zero length of data.

Update callers of these functions.

Update/add unit tests.
Refactor cache_entry_get_raw to return data buffer and length
as in/out parameters, returning integer to indicate success/failure.

This is so the data buffer, if NULL, can be returned to the user.
Add unit tests appropriately.
Alter usage of cache_entry_set_json().  If user wishes to clear
data, the user should use cache_entry_clear_data().  A NULL input
to cache_entry_set_json() will now result in an error.

Update code accordingly for change.
In the internal cache API, add a data_valid flag which indicates
if the user has set data or not.  This flag is used instead of
checking data == NULL for validity, as checking the data pointer
for NULL would not allow zero length content-blobs to be stored
in the cache.

A major fallout of from this addition is the change to the
cache_entry_set_raw() function.  It was previously unclear
if passing NULL & length 0 to cache_entry_set_raw() indicated
if you were setting a zero length data or trying to clear
prior data.  Now it is made clear that this is setting zero
length data and clearing data is through cache_entry_clear_data().
In addition, calling cache_entry_set_raw() with zero length
data can result in wait_runqueue() being run on hp->waitlist_valid.

Added new unit tests for coverage.
@chu11
Copy link
Member Author

chu11 commented Oct 16, 2017

just re-pushed w/ a rebase

@chu11
Copy link
Member Author

chu11 commented Oct 16, 2017

restarted some builds, hit a write error, #1169, and one that timed out for some reason. travis seems really slow right now

@coveralls
Copy link

Coverage Status

Coverage increased (+0.004%) to 78.716% when pulling 598fd53 on chu11:issue1232 into 2bddc2e on flux-framework:master.

@garlick garlick merged commit 9f03dda into flux-framework:master Oct 16, 2017
@grondo grondo mentioned this pull request May 10, 2018
@chu11 chu11 deleted the issue1232 branch June 5, 2021 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants