Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs-watch: support FLUX_KVS_STREAM flag #6523

Merged
merged 7 commits into from
Dec 20, 2024

Conversation

chu11
Copy link
Member

@chu11 chu11 commented Dec 18, 2024

Problem: It would be convenient if there was a way to stream large data from the KVS by their individual content blobs, rather than getting a very large piece of content in one response. Such a response can be slow and give the appearance that something is hanging. For example, this ability would be useful for a large amount of standard output stored in the KVS.

Support a new FLUX_KVS_STREAM flag. It will use the mechanisms built into the kvs-watch module and the FLUX_KVS_WATCH_APPEND flag to stream each content blob to the user. Once all the content is streamed, it will return ENODATA to indicate the stream is complete.

Problem: When a directory is watched with the WATCH_APPEND flag, an
errno of EINVAL is returned.  However, EISDIR is probably the more
expected error when the invalid value is specifically a directory.

Return EISDIR if there is an attempt to WATCH_APPEND specifically
on a directory.
@garlick
Copy link
Member

garlick commented Dec 18, 2024

Any reason not to use the code written for the test to implement flux kvs get --stream instead?

@chu11
Copy link
Member Author

chu11 commented Dec 18, 2024

Any reason not to use the code written for the test to implement flux kvs get --stream instead?

The two main reasons were

A) output wise flux kvs get and flux kvs get --stream don't have different output, so it didn't seem like a necessary feature to add to flux kvs get?

B) to ensure that streaming actually works, I put "reply counts" into the outupt of watch_stream, i.e. instead of

foo
bar

you get

1: foo
2: bar

so it ensures the streaming is working (i.e. vs a bug and I basically did a flux kvs get without streaming)..

@garlick
Copy link
Member

garlick commented Dec 18, 2024

Maybe flux kvs get --stream could be opened as an issue for a "nice to have" feature then?

@chu11
Copy link
Member Author

chu11 commented Dec 18, 2024

Maybe flux kvs get --stream could be opened as an issue for a "nice to have" feature then?

It wouldn't be hard to add to this PR, and I guess wouldn't be that bad of a "nice to have". It just wouldn't remove the need for the watch_stream testing tool b/c of the "reply count" prefix needed to make sure it really is streaming.

@garlick
Copy link
Member

garlick commented Dec 18, 2024

I suppose a --stats option could be added to flux kvs get so you get a one line report at the end on stderr that included the number of messages? Not a big deal though - the test program isn't much code!

@chu11
Copy link
Member Author

chu11 commented Dec 18, 2024

I suppose a --stats option could be added to flux kvs get so you get a one line report at the end on stderr that included the number of messages?

Yeah that could be done, but seems a little excessive?

@garlick
Copy link
Member

garlick commented Dec 18, 2024

Yeah totally up to you on that one :-) I was just throwing it out there.

@chu11
Copy link
Member Author

chu11 commented Dec 18, 2024

re-pushed w/ flux kvs get --sream support. Doing a quick test

time flux kvs get job.0000.000e.5700.0000.guest.output

vs

time flux kvs get --stream job.0000.000e.5700.0000.guest.output

was about 16 vs 11 seconds on the output from a flux lptest 2000000 80. Woot!

I should probably add --stream to flux kvs eventlog get as well. Will add for this PR later today or tomorrow.

I noticed that flux kvs dir -R . was slow b/c it gets the whole output. May be worthwhile to enhance? That should be a different issue / PR though.

@garlick
Copy link
Member

garlick commented Dec 19, 2024

Just watching a flux module trace kvs-watch while I fetch a single large output eventlog, I'm wondering why the loads all complete before the first response is sent:

2024-12-18T16:30:09.542 kvs-watch rx > kvs-watch.lookup [81]
2024-12-18T16:30:09.542 kvs-watch tx > kvs.lookup-plus [81]
2024-12-18T16:30:09.542 kvs-watch rx < kvs.lookup-plus [978] success
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch rx < content.load [165] success
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx < kvs-watch.lookup [261] success
2024-12-18T16:30:09.545 kvs-watch rx < content.load [9.7403173M] success
2024-12-18T16:30:09.545 kvs-watch rx < content.load [238] success
2024-12-18T16:30:09.551 kvs-watch rx < content.load [9.7401266M] success
2024-12-18T16:30:09.558 kvs-watch rx < content.load [9.7401285M] success
2024-12-18T16:30:09.562 kvs-watch rx < content.load [9.7401266M] success
2024-12-18T16:30:09.567 kvs-watch rx < content.load [9.7401266M] success
2024-12-18T16:30:09.571 kvs-watch rx < content.load [9.7401285M] success
2024-12-18T16:30:09.574 kvs-watch rx < content.load [9.7401276M] success
2024-12-18T16:30:09.578 kvs-watch rx < content.load [9.7401285M] success
2024-12-18T16:30:09.584 kvs-watch rx < content.load [9.7401285M] success
2024-12-18T16:30:09.587 kvs-watch rx < content.load [9.7401323M] success
2024-12-18T16:30:09.591 kvs-watch rx < content.load [9.7401323M] success
2024-12-18T16:30:09.594 kvs-watch rx < content.load [9.7401333M] success
2024-12-18T16:30:09.597 kvs-watch rx < content.load [9.7401323M] success
2024-12-18T16:30:09.600 kvs-watch rx < content.load [9.7401266M] success
2024-12-18T16:30:09.603 kvs-watch rx < content.load [9.7401276M] success
2024-12-18T16:30:09.610 kvs-watch rx < content.load [9.7401285M] success
2024-12-18T16:30:09.631 kvs-watch tx < kvs-watch.lookup [12.986875M] success
2024-12-18T16:30:09.694 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:09.750 kvs-watch tx < kvs-watch.lookup [12.986875M] success
2024-12-18T16:30:09.809 kvs-watch tx < kvs-watch.lookup [12.986875M] success
2024-12-18T16:30:09.884 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:09.940 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:09.999 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:10.061 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:10.127 kvs-watch tx < kvs-watch.lookup [12.986882M] success
2024-12-18T16:30:10.185 kvs-watch tx < kvs-watch.lookup [12.986882M] success
2024-12-18T16:30:10.241 kvs-watch tx < kvs-watch.lookup [12.986886M] success
2024-12-18T16:30:10.294 kvs-watch tx < kvs-watch.lookup [12.986882M] success
2024-12-18T16:30:10.349 kvs-watch tx < kvs-watch.lookup [12.986875M] success
2024-12-18T16:30:10.405 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:10.460 kvs-watch tx < kvs-watch.lookup [12.986878M] success
2024-12-18T16:30:10.529 kvs-watch tx < kvs-watch.lookup [12.98713M] success
2024-12-18T16:30:10.529 kvs-watch tx < kvs-watch.lookup [361] success
2024-12-18T16:30:10.529 kvs-watch tx < kvs-watch.lookup [0] ENODATA
2024-12-18T16:30:18.855 kvs-watch rx > kvs-watch.disconnect [0]

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few preliminary comments/questions.

@@ -35,7 +35,8 @@ enum kvs_op {
FLUX_KVS_APPEND = 32,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message suggestion: restate the problem more directly, e.g.

Responses to KVS lookups must be contained within one message, no matter how large the value is.

if (w->flags & FLUX_KVS_WATCH_APPEND)
errprintf (&err, "key watched with WATCH_APPEND shortened");
else
errprintf (&err, "key being streamed shortened");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems like in the STREAM case, we should just be working from a static version of the treeobj?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that's right. I was just trying to avoid the "WATCH_APPEND" in an error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just make it

// cannot occur with FLUX_KVS_STREAM
errprintf (&err, "key watched with WATCH_APPEND shortened");

if that makes sense? Otherwise the error message implies that it's an expected case which is a bit confusing.

@@ -46,6 +46,7 @@ struct watcher {
bool index_valid; // flag if prev_start_index/prev_end_index set
int prev_start_index; // previous start index loaded
int prev_end_index; // previous end index loaded
int append_send_count; // number of indexes loaded (for FLUX_KVS_STREAM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this shouldn't be named "append" anything since there is no appending going on with FLUX_KVS_STREAM. Should it be a loaded_blob_count instead?

Plural of index is indices.

Comment on lines 628 to 629
errprintf (&err,
"value of key streamed was overwritten");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why should that matter?

@chu11
Copy link
Member Author

chu11 commented Dec 19, 2024

Just watching a flux module trace kvs-watch while I fetch a single large output eventlog, I'm wondering why the loads all complete before the first response is sent:

If i'm reading this correctly one goes back early? Probably the stdout header.

2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch rx < content.load [165] success  <---------- this one
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.542 kvs-watch tx > content.load [20]
<snip>
2024-12-18T16:30:09.543 kvs-watch tx > content.load [20]
2024-12-18T16:30:09.543 kvs-watch tx < kvs-watch.lookup [261] success
2024-12-18T16:30:09.545 kvs-watch rx < content.load [9.7403173M] success
2024-12-18T16:30:09.545 kvs-watch rx < content.load [238] success
2024-12-18T16:30:09.551 kvs-watch rx < content.load [9.7401266M] success
2024-12-18T16:30:09.558 kvs-watch rx < content.load [9.7401285M] success

But yeah, you'd think they should be more spread out. I'm wondering if the sqlite back end batches some lookups together and they all "return" at the same time? I should look into this a bit.

@garlick
Copy link
Member

garlick commented Dec 19, 2024

That's one content response that comes back early but kvs-watch doesn't send any responses back to the lookup request until all the blobs have been received. It looked like the only time that would happen is if the first blob was received last from content, which is a bit unexpected if true :-)

@chu11
Copy link
Member Author

chu11 commented Dec 19, 2024

That's one content response that comes back early but kvs-watch doesn't send any responses back to the lookup request until all the blobs have been received. It looked like the only time that would happen is if the first blob was received last from content, which is a bit unexpected if true :-)

So I think I know the problem. Although I didn't profile it, I think the treeobj_create_val() is the most costly operation here. I assume its the costly base64 encoding w/ multiple meg blobs.

So when the treeobj_create_val() is called before sending the first response (via flux_respond_pack()) the other content.load responses are coming in.

The "mix" between content.load response and responses to kvs-lookup are more clear when data blob sizes are smaller.

@garlick
Copy link
Member

garlick commented Dec 19, 2024

Ah that makes sense! Hmm, maybe we should add a raw option sometime :-)

@chu11
Copy link
Member Author

chu11 commented Dec 19, 2024

Ah that makes sense! Hmm, maybe we should add a raw option sometime :-)

Yeah, this is definitely for study later on. I think there might be little performance improvements here and there we can do with that. The treeobj_create_val() is quite costly.

Problem: Responses to KVS lookups must be contained within one message,
no matter how large the value is.  Such a response can be slow and give the
appearance that something is hanging.  For example, a large amount of standard
output stored in the KVS would not return immediately.

Support a new FLUX_KVS_STREAM flag.  It will use the mechanisms built
into the kvs-watch module and the FLUX_KVS_WATCH_APPEND flag to stream
each content blob of large values to the user.  Once all the content is
streamed, it will return ENODATA to indicate the stream is complete.
Problem: There is no coverage for the new FLUX_KVS_STREAM flag.

Add coverage in t/t1007-kvs-lookup-watch.t and add new testing
tool t/kvs/watch_stream.c.
Problem: The new FLUX_KVS_STREAM flag is not documented.

Add documentation in flux_kvs_lookup(3).
Problem: The newly supported FLUX_KVS_STREAM flag does not have
an equivalent option for support in flux-kvs.

Support the --stream option under flux kvs get and flux kvs eventlog
get.
Problem: The new flux kvs --stream option with flux kvs get
and flux kvs eventlog get is not documented.

Add documentation in flux-kvs(1).
Problem: There is no coverage of the new flux kvs --stream
option with flux kvs get and flux kvs eventlog get.

Add coverage in t1007-kvs-lookup-watch.t.
@chu11
Copy link
Member Author

chu11 commented Dec 20, 2024

re-pushed w/ fixes to comments above + added --stream support to flux kvs eventlog get. This was a good set of tests, ended up flux kvs eventlog get --stream --waitcreate did not initially work ;-)

@garlick perhaps wanna take another quick skim just in case?

Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm glad you added the flux kvs get --stream option.

@mergify mergify bot merged commit e10d676 into flux-framework:master Dec 20, 2024
35 checks passed
Copy link

codecov bot commented Dec 20, 2024

Codecov Report

Attention: Patch coverage is 88.04348% with 11 lines in your changes missing coverage. Please review.

Project coverage is 83.62%. Comparing base (c9eb3a8) to head (05aa903).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/modules/kvs-watch/kvs-watch.c 82.45% 10 Missing ⚠️
src/cmd/flux-kvs.c 96.29% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6523      +/-   ##
==========================================
+ Coverage   83.61%   83.62%   +0.01%     
==========================================
  Files         522      522              
  Lines       87734    87805      +71     
==========================================
+ Hits        73356    73431      +75     
+ Misses      14378    14374       -4     
Files with missing lines Coverage Δ
src/common/libkvs/kvs_lookup.c 85.99% <100.00%> (+0.27%) ⬆️
src/cmd/flux-kvs.c 84.48% <96.29%> (+0.18%) ⬆️
src/modules/kvs-watch/kvs-watch.c 78.30% <82.45%> (+0.03%) ⬆️

... and 5 files with indirect coverage changes

@chu11 chu11 deleted the kvs_stream_flag branch December 20, 2024 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants