-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvs-watch: retrieve content blobs in chunks rather than all at once #6456
Comments
While this issue still exists, it appears that the larger performance issue is in Edit: I'm wondering if we could simply treat the "guest_released" and "guest_started" paths similarly. And we would get the faster lookup from
|
just documenting prototype failures. The main issue is how can the
|
Problem: With the FLUX_KVS_WATCH_APPEND and FLUX_KVS_STREAM flags, all content blobs in a valref treeobj will be retrieved from the content store at once. If the valref array is gigantic, this may be a very costly initial transaction. Not to mention, we probably wouldn't want to send an extremely large number (like millions) of content requests all at once. Solution: Send content requests in more reasonable 32K chunks. Fixes flux-framework#6456�
While #6414 / #6444 solves an important initial problem of not re-fetching older unnecessary data, whenever there is a large amount of data to retrieve, it sends requests to the content store to retrieve all data.
The difference can be seen in something like
In the first job, when stdout is being "watched", the very first lookup of the stdout will not be that large. And every subsequent append that is watched should be only 1 or a few lines of output.
In the follow up
flux job attach
, the job is already completed. So the very first lookup of the tree object for stdout will be all 2 million lines. kvs-watch will subsequently do a lookup of all 2 million lines. i.e. it's a for loop doing 2 million iterations and sending 2 million RPCs. That's not good.Instead, only a subset should be retrieved in "windows". This allows data to be sent back to the caller much more quickly.
The text was updated successfully, but these errors were encountered: