kvs-watch: support size/offset for watch to begin #6292

chu11 · 2024-09-19T18:04:09Z

Per discussion in #4572

If hypothetically we just stop the watcher, we could restart it but we'd need some kind of "offset" or "seek" mechanism.

chu11 · 2024-11-04T20:45:49Z

just a brainstorm, a size/offset probably is hard to do. pondering if some type of "credits" on the eventlog watch would work. i.e.

tell eventlog watch to send up to X data (bytes? eventlog entries?) and then don't send me more until i say so
new RPC tells eventlog watch it can send more

hmmm, off the top of my head it feels harder to implement, but would obviously perform better (size/offset might requires users to start/stop the watcher multiple times).

chu11 · 2024-12-12T17:46:31Z

while working on #6456, realized there are some core design stuff that makes a size/offset/index very hard to do / will not perform well.

The number one issue is that due to kvs merging of commits, multiple events can be inside a single content blob. That means that we cannot use KVS valref "indexes" as an offset of sorts. (i.e. a user cannot count the number of events they've seen and use that to tell kvs-watch to begin at event number X).
- related to this, we do not know how many events are inside each blobref. Hypothetically if there was some metadata along with each blobref in the valref, this could be done.
There is no apriori knowledge of what the size of a content blob is. We only know the size after we read it. So that makes restarting a watch at a certain offset very costly, we cannot (spiritually) "seek" to a location.
- Within the work on kvs: do not allow appends that exceed maximum allowed value size #6264/kvs: ensure large values exceeding maximum rejected consistently #6265/kvs: define maximum size #6266, perhaps this can be designed in. A very stupid idea, instead of the valref array only containing a blobref, it could contain an object with the blobref and the size of the data of the blobref (along w/ possibly other metadata). A less intrusive mechanism would to have a parallel array with the data sizes of each blobref.

garlick · 2024-12-12T18:52:08Z

Sorry, backing up because I've lost the thread. Is this right?

we're trying to resolve flux-exec: Error: rank 0: cat: Value too large for defined data type #4572
the shell input plugin currently watches the guest.input eventlog with job-info.eventlog-watch
it may receive a large volume of data that it cannot buffer
this issue was intended to provide a mechanism in kvs-watch that job-info would use to stream events in some way that could be regulated by the consumer

I think you're right that it would be very inconvenient for kvs-watch to deal in any sort of offset that isn't a valref index. But doesn't that just mean that whatever is talking to kvs-watch would need to be willing to buffer at least one blob?

What sort of interface did you have in mind to provide to the shell? I assume some variant on flux_job_event_watch()?

chu11 · 2024-12-12T18:59:27Z

Sorry, backing up because I've lost the thread. Is this right?

Yes, this is the "prequel" issue to #6413.

What sort of interface did you have in mind to provide to the shell? I assume some variant on flux_job_event_watch()?

Before hitting these issues, my thought was the user could count the number of events it received or the amount of data it had received. If we would overflow the buffer, stop the watcher. Restart the stdin watcher at some index/offset when we get credits back. We would perhaps have to buffer the "current" data if we've only sent some portion it.

garlick · 2024-12-12T19:55:21Z

Hmm, ok. Maybe the client/shell could use an interface offered by job-info that accepts an eventlog entry index, and that service would use one provided by kvs-watch that accepts a treeref index? Then both of those services would need to buffer some data.

Or, instead of canceling and restarting a watch at an offset which could be wasteful if a lot of data gets thrown away on every cancel, maybe we just need a generic way for a streaming RPC to be paused and resumed? Or maybe a flag to reverse it so a streaming RPC starts paused and only sends data when it receives credit?

chu11 · 2024-12-12T20:14:40Z

Hmm, ok. Maybe the client/shell could use an interface offered by job-info that accepts an eventlog entry index, and that service would use one provided by kvs-watch that accepts a treeref index? Then both of those services would need to buffer some data.

The main issue, which I did not realize until working on other things, is that multiple eventlog entries could be inside a single blobref. So it sort of makes going off the treeref index very unreliable.

Or, instead of canceling and restarting a watch at an offset which could be wasteful if a lot of data gets thrown away on every cancel, maybe we just need a generic way for a streaming RPC to be paused and resumed? Or maybe a flag to reverse it so a streaming RPC starts paused and only sends data when it receives credit?

Good ideas. This is along the lines of a brainstorming idea I had a few comments up, where I put a "credit system" into job info. Although now that you mention it, a "pause" and "resume" is far simpler than a "credits" one.

garlick · 2024-12-12T20:18:01Z

The main issue, which I did not realize until working on other things, is that multiple eventlog entries could be inside a single blobref. So it sort of makes going off the treeref index very unreliable.

Doesn't it just mean when you get the current blob (in job-info), you have to hold onto it and send an event at a time to the consumer, then get the next blob?

Good ideas. This is along the lines of a brainstorming idea I had a few comments up, where I put a "credit system" into job info. Although now that you mention it, a "pause" and "resume" is far simpler than a "credits" one.

Apologies for not keeping up! Actually credits is probably better for the general case and pause resume is like a degenerate case of that with credit = 1 right?

chu11 · 2024-12-12T20:54:00Z

The main issue, which I did not realize until working on other things, is that multiple eventlog entries could be inside a single blobref. So it sort of makes going off the treeref index very unreliable.

Doesn't it just mean when you get the current blob (in job-info), you have to hold onto it and send an event at a time to the consumer, then get the next blob?

I'm not following your point. My original point was more with the "restart the watcher" case, b/c multiple events can occur inside each blobref, I cannot pass a "start at event number X" to job-info. In order to start at event number X, I'd have to re-read the whole eventlog from the start again.

If you're thinking about the "pause/resume" case, kvs-watch will get each blob and stream it to job-info. job-info will divide that blob up into individual events and stream those to the caller. So job-info would be able to "pause" w/ kvs-watch to say "no more data for a bit" and simply cache the blobs that were in transit at the time. Although that could be alot of data. Perhaps warranting the "credits" idea.

Actually credits is probably better for the general case and pause resume is like a degenerate case of that with credit = 1 right?

Yeah, although I think you mean credits = infinity w/ resume and credits = 0 w/ pause.

garlick · 2024-12-12T21:40:31Z

I'm not following your point. My original point was more with the "restart the watcher" case, b/c multiple events can occur inside each blobref, I cannot pass a "start at event number X" to job-info. In order to start at event number X, I'd have to re-read the whole eventlog from the start again.

Sorry, I probably conflated the two thoughts when I typed that!

Yeah, although I think you mean credits = infinity w/ resume and credits = 0 w/ pause.

Er yeah, was thinking of the reverse I guess where sender starts out paused (zero credits) and has to get a "resume" (credit++) before sending each response, so the credit count oscillates between 0 and 1.

chu11 changed the title ~~kvs-wach: support size/offset for watch to begin~~ kvs-watch: support size/offset for watch to begin Sep 19, 2024

chu11 mentioned this issue Nov 4, 2024

flux-shell: support stdin flow control to subprocesses #6413

Open

chu11 mentioned this issue Nov 6, 2024

kvs-watch: avoid re-fetching the same data repeatedly #6414

Closed

chu11 mentioned this issue Nov 15, 2024

kvs-watch: only fetch new data for appends #6444

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvs-watch: support size/offset for watch to begin #6292

kvs-watch: support size/offset for watch to begin #6292

chu11 commented Sep 19, 2024

chu11 commented Nov 4, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

kvs-watch: support size/offset for watch to begin #6292

kvs-watch: support size/offset for watch to begin #6292

Comments

chu11 commented Sep 19, 2024

chu11 commented Nov 4, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024

chu11 commented Dec 12, 2024

garlick commented Dec 12, 2024