Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs: [cleanup] improve isolation in internal cache #1274

Merged
merged 8 commits into from
Nov 8, 2017

Conversation

garlick
Copy link
Member

@garlick garlick commented Nov 7, 2017

As discussed in #1264, cache_entry_set_raw() and cache_entry_set_treeobj() take ownership of a data/json_t pointer allocated in the caller. The caller must be trusted not to free or modify the data afterward, or the cache will be corrupted. This seems a bit unsafe.

Copy the data, increasing isolation between the cache and its callers.

In the treeobj case, avoid caching the json_t object directly, since it is hard to ensure that the caller won't change it later. We could json_deep_copy() it, but it is perhaps cheaper to skip caching it here, and instead allow the object to be regenerated from the encoded data on first use (if it gets used at all).

Plus one small optimization: avoid an extra call to treeobj_encode in the commit path, where we had to first encode the object to compute its hash key, then again to store it. Just encode it once, then compute the hash key, then store the raw data.

@garlick garlick requested a review from chu11 November 7, 2017 01:08
@coveralls
Copy link

Coverage Status

Coverage increased (+0.003%) to 78.517% when pulling 1695116 on garlick:cache_copy into d7300ec on flux-framework:master.

@codecov-io
Copy link

codecov-io commented Nov 7, 2017

Codecov Report

Merging #1274 into master will increase coverage by 0.06%.
The diff coverage is 61.29%.

@@            Coverage Diff             @@
##           master    #1274      +/-   ##
==========================================
+ Coverage   77.88%   77.94%   +0.06%     
==========================================
  Files         154      154              
  Lines       29124    29097      -27     
==========================================
- Hits        22684    22681       -3     
+ Misses       6440     6416      -24
Impacted Files Coverage Δ
src/common/libkvs/kvs.c 65.15% <ø> (+1.11%) ⬆️
src/modules/kvs/commit.c 75.77% <52.94%> (+0.98%) ⬆️
src/modules/kvs/kvs.c 63.75% <59.09%> (+1.12%) ⬆️
src/modules/kvs/cache.c 90.44% <78.94%> (+1.72%) ⬆️
src/common/libflux/request.c 88.46% <0%> (-1.29%) ⬇️
src/common/libflux/mrpc.c 85.49% <0%> (-1.18%) ⬇️
src/broker/modservice.c 80.58% <0%> (-0.98%) ⬇️
src/common/libflux/response.c 83.73% <0%> (-0.82%) ⬇️
src/modules/kvs/lookup.c 80% <0%> (-0.48%) ⬇️
src/broker/module.c 83.79% <0%> (-0.28%) ⬇️
... and 6 more

if (!(cpy = malloc (len)))
return -1;
memcpy (cpy, data, len);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, lingering empty line

* that alrdady has data stored.
* to set new data in a cache entry will silently succeed.
* A treeobj object passed to cache_entry_set_treeobj() will be
* json_decref()'d for a cache entry that alrdady has data stored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo "already"

* cache_entry_set_treeobj() will be json_decref()'d for a cache entry
* that alrdady has data stored.
* to set new data in a cache entry will silently succeed.
* A treeobj object passed to cache_entry_set_treeobj() will be
Copy link
Member

@chu11 chu11 Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct "A treeobj object passed to cache_entry_set_treeobj() will be json_decref()'d for a cache entry that alrdady has data stored."? Doesn't seem to be the case anymore.

data = NULL;

ok (cache_entry_set_raw (e, NULL, 0) < 0
ok (cache_entry_set_raw (e, data, 4) < 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think we want to keep the cache_entry_set_raw (e, NULL, 0) test, and also add cache_entry_set_raw (e, data, 4) as a new test to detect EBADE. Both are good corner case catches.

@chu11
Copy link
Member

chu11 commented Nov 7, 2017

@garlick overall looks good, lots of good cleanup. Just some nits I found along the way.

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

Thanks - here come a few commits to address your issues. I'll squash once we're done here.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.04%) to 78.552% when pulling 88043ad on garlick:cache_copy into d7300ec on flux-framework:master.

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

I went ahead and squashed those fixes. Hope that's OK...

@coveralls
Copy link

coveralls commented Nov 7, 2017

Coverage Status

Coverage increased (+0.04%) to 78.556% when pulling e937cc7 on garlick:cache_copy into d7300ec on flux-framework:master.

@chu11
Copy link
Member

chu11 commented Nov 7, 2017

LGTM, I'm not entirely sure why the merge button isn't green though. Everything seems to have passed. Perhaps something buggy in travis. I'm going to restart the builder.

@chu11
Copy link
Member

chu11 commented Nov 7, 2017

hmmm, travis and github don't seem synced up, it's really slow. Will merge whenever that's resolved.

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

I restarted one builder that got stuck in the wreck tests (or maybe it was running slow and only got that far -hard to tell).

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

Uh, it looks like I at least missed a comment in store_initial_rootdir(). Possibly that code should be converted from treeobj_hash() + cache_entry_set_treeobj() to treeobj_encode() + cache_entry_set_raw() also...

@chu11
Copy link
Member

chu11 commented Nov 7, 2017

ahhh, just like you did in store_cache(), yeah that could be enhanced too. Although, store_initial_rootdir() is a one time initialization, so avoiding the extra treeobj_encode() shouldn't matter too much.

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

We can eliminate cache_entry_set_treeobj() that way though. Hang on, I've got a couple more commits for you to review.

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

Still need to cull some cache unit tests but wanted your opinion @chu11 on whether dropping set_treeobj() was a step in the right direction? It seemed like it was encouraging callers to encode once to compute the hash key, and again implicitly when setting the cache entry content. It seems better to pick one place to avoid the extra work and potentially getting something wrong in one and corrupting the cache.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.06%) to 78.573% when pulling 8dcaea2 on garlick:cache_copy into d7300ec on flux-framework:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.005%) to 78.52% when pulling e99c171 on garlick:cache_copy into d7300ec on flux-framework:master.

}
}

if (!json_is_null (rootdir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's important to leave a comment here indicating that if an error occurs in prime_cache_with_rootdir(), it's no big deal. We still set the setroot and things are generally fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did make the function return void and put a note about failure being OK over the function. More blatant warning needed you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, just a bit more blatant is what I'm thinking.

@chu11
Copy link
Member

chu11 commented Nov 7, 2017

Good job on cleaning up set_initial_rootdir(), obviously didn't have to set dirty bit b/c the storage is synchronous.

Overall, I think trying to get rid of cache_entry_set_treeobj() is a good step to take. It removes a lot of additional complication in the cache.

void *data = NULL;
int len;

if (treeobj_validate (rootdir) < 0 || treeobj_is_dir (rootdir)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!treeobj_is_dir()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ensuring that the "rootdir" really is a dir, not some other object. Unlikely since the master would have already checked that. Drop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it should it be opposite logic. If is not-dir, then error out. Right now it errors if it is a dir.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, reverse logic, got it!

Problem: cache_entry_set_raw() and cache_entry_set_treeobj()
take ownership of a data/json_t pointer allocated in the
caller.  The caller must be trusted not to free or modify
the data afterward, or the cache will be corrupted.

Copy the data, making it easier to guarantee that the result
is not modified or freed by the original caller. In the
case of json_t, cache the encoded data but skip caching
the original object.  The object will be decoded from the
data (and then cached) upon first access.

Update users in kvs.c and commit.c.

Update unit tests.
Problem: treeobj that is to be stored in the cache is encoded
once to calculate its hash, and again to be stored in raw
form.

Just encode the treeobj in raw form, then calculate its
hash and store it raw.
Problem: rootdir that is to be stored in the cache is encoded
once to calculate its hash, and again to be stored in raw form.

Change store_initial_rootdir() to use treeobj_encode(),
then cache_entry_set_raw().
Problem: the name "rootdir" is used to refer to the root
blobref, which is easily confused with the root directory
object.

Rename "rootdir" to "rootref" where appropriate.

Create a 'struct kvsroot' to hold the root blobref and
sequence number, since these are tightly coupled and
there will need to be more than one pair for when
planned support for multiple namespaces is implemented.
Problem: update kvs.sync, kvs.getroot, kvs.setroot (rpc),
and kvs.setroot (event) protocol uses naming inconsistent
with current norms in the code.

Change "rootdir" to "rootref", and "rootdirval" to "rootdir"
where appropriate.
@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

@chu11: this look OK (last 2 commits)? If so I'll squash.

@chu11
Copy link
Member

chu11 commented Nov 7, 2017

yup, squash away

@garlick
Copy link
Member Author

garlick commented Nov 7, 2017

OK, those are squashed now.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 78.573% when pulling e4c605f on garlick:cache_copy into 4456339 on flux-framework:master.

Problem: the important kvs.setroot event handler is a
bit complicated by the fact that the directory object is
optionally provided in the event message as an optimization.

Split the optimization off to a separate function,
don't try too hard to handle the case of an existing but
dirty/invalid cache entry, and recompute the hash after
encoding the directory rather than trusting the hash received
in the message to match.

This eliminates the last user of cache_entry_set_treeobj().
This function no longer has any users except in tests.
Replicate it as a convenience function in the tests.
Drop tests that poke at cache_entry_set_treeobj() which
is now gone, or expect cache entries containing raw
versus treeobj data to behave differently in ways
that are no longer possible.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.03%) to 78.541% when pulling ba26ea2 on garlick:cache_copy into 4456339 on flux-framework:master.

@chu11 chu11 merged commit 14943b8 into flux-framework:master Nov 8, 2017
@garlick garlick deleted the cache_copy branch November 8, 2017 19:01
@grondo grondo mentioned this pull request May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants