Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvs-watch: support size/offset for watch to begin #6292

Open
chu11 opened this issue Sep 19, 2024 · 9 comments
Open

kvs-watch: support size/offset for watch to begin #6292

chu11 opened this issue Sep 19, 2024 · 9 comments

Comments

@chu11
Copy link
Member

chu11 commented Sep 19, 2024

Per discussion in #4572

If hypothetically we just stop the watcher, we could restart it but we'd need some kind of "offset" or "seek" mechanism.

@chu11 chu11 changed the title kvs-wach: support size/offset for watch to begin kvs-watch: support size/offset for watch to begin Sep 19, 2024
@chu11
Copy link
Member Author

chu11 commented Nov 4, 2024

just a brainstorm, a size/offset probably is hard to do. pondering if some type of "credits" on the eventlog watch would work. i.e.

  • tell eventlog watch to send up to X data (bytes? eventlog entries?) and then don't send me more until i say so
  • new RPC tells eventlog watch it can send more

hmmm, off the top of my head it feels harder to implement, but would obviously perform better (size/offset might requires users to start/stop the watcher multiple times).

@chu11
Copy link
Member Author

chu11 commented Dec 12, 2024

while working on #6456, realized there are some core design stuff that makes a size/offset/index very hard to do / will not perform well.

  • The number one issue is that due to kvs merging of commits, multiple events can be inside a single content blob. That means that we cannot use KVS valref "indexes" as an offset of sorts. (i.e. a user cannot count the number of events they've seen and use that to tell kvs-watch to begin at event number X).

    • related to this, we do not know how many events are inside each blobref. Hypothetically if there was some metadata along with each blobref in the valref, this could be done.
  • There is no apriori knowledge of what the size of a content blob is. We only know the size after we read it. So that makes restarting a watch at a certain offset very costly, we cannot (spiritually) "seek" to a location.

@garlick
Copy link
Member

garlick commented Dec 12, 2024

Sorry, backing up because I've lost the thread. Is this right?

  • we're trying to resolve flux-exec: Error: rank 0: cat: Value too large for defined data type #4572
  • the shell input plugin currently watches the guest.input eventlog with job-info.eventlog-watch
  • it may receive a large volume of data that it cannot buffer
  • this issue was intended to provide a mechanism in kvs-watch that job-info would use to stream events in some way that could be regulated by the consumer

I think you're right that it would be very inconvenient for kvs-watch to deal in any sort of offset that isn't a valref index. But doesn't that just mean that whatever is talking to kvs-watch would need to be willing to buffer at least one blob?

What sort of interface did you have in mind to provide to the shell? I assume some variant on flux_job_event_watch()?

@chu11
Copy link
Member Author

chu11 commented Dec 12, 2024

Sorry, backing up because I've lost the thread. Is this right?

Yes, this is the "prequel" issue to #6413.

What sort of interface did you have in mind to provide to the shell? I assume some variant on flux_job_event_watch()?

Before hitting these issues, my thought was the user could count the number of events it received or the amount of data it had received. If we would overflow the buffer, stop the watcher. Restart the stdin watcher at some index/offset when we get credits back. We would perhaps have to buffer the "current" data if we've only sent some portion it.

@garlick
Copy link
Member

garlick commented Dec 12, 2024

Hmm, ok. Maybe the client/shell could use an interface offered by job-info that accepts an eventlog entry index, and that service would use one provided by kvs-watch that accepts a treeref index? Then both of those services would need to buffer some data.

Or, instead of canceling and restarting a watch at an offset which could be wasteful if a lot of data gets thrown away on every cancel, maybe we just need a generic way for a streaming RPC to be paused and resumed? Or maybe a flag to reverse it so a streaming RPC starts paused and only sends data when it receives credit?

@chu11
Copy link
Member Author

chu11 commented Dec 12, 2024

Hmm, ok. Maybe the client/shell could use an interface offered by job-info that accepts an eventlog entry index, and that service would use one provided by kvs-watch that accepts a treeref index? Then both of those services would need to buffer some data.

The main issue, which I did not realize until working on other things, is that multiple eventlog entries could be inside a single blobref. So it sort of makes going off the treeref index very unreliable.

Or, instead of canceling and restarting a watch at an offset which could be wasteful if a lot of data gets thrown away on every cancel, maybe we just need a generic way for a streaming RPC to be paused and resumed? Or maybe a flag to reverse it so a streaming RPC starts paused and only sends data when it receives credit?

Good ideas. This is along the lines of a brainstorming idea I had a few comments up, where I put a "credit system" into job info. Although now that you mention it, a "pause" and "resume" is far simpler than a "credits" one.

@garlick
Copy link
Member

garlick commented Dec 12, 2024

The main issue, which I did not realize until working on other things, is that multiple eventlog entries could be inside a single blobref. So it sort of makes going off the treeref index very unreliable.

Doesn't it just mean when you get the current blob (in job-info), you have to hold onto it and send an event at a time to the consumer, then get the next blob?

Good ideas. This is along the lines of a brainstorming idea I had a few comments up, where I put a "credit system" into job info. Although now that you mention it, a "pause" and "resume" is far simpler than a "credits" one.

Apologies for not keeping up! Actually credits is probably better for the general case and pause resume is like a degenerate case of that with credit = 1 right?

@chu11
Copy link
Member Author

chu11 commented Dec 12, 2024

The main issue, which I did not realize until working on other things, is that multiple eventlog entries could be inside a single blobref. So it sort of makes going off the treeref index very unreliable.

Doesn't it just mean when you get the current blob (in job-info), you have to hold onto it and send an event at a time to the consumer, then get the next blob?

I'm not following your point. My original point was more with the "restart the watcher" case, b/c multiple events can occur inside each blobref, I cannot pass a "start at event number X" to job-info. In order to start at event number X, I'd have to re-read the whole eventlog from the start again.

If you're thinking about the "pause/resume" case, kvs-watch will get each blob and stream it to job-info. job-info will divide that blob up into individual events and stream those to the caller. So job-info would be able to "pause" w/ kvs-watch to say "no more data for a bit" and simply cache the blobs that were in transit at the time. Although that could be alot of data. Perhaps warranting the "credits" idea.

Actually credits is probably better for the general case and pause resume is like a degenerate case of that with credit = 1 right?

Yeah, although I think you mean credits = infinity w/ resume and credits = 0 w/ pause.

@garlick
Copy link
Member

garlick commented Dec 12, 2024

I'm not following your point. My original point was more with the "restart the watcher" case, b/c multiple events can occur inside each blobref, I cannot pass a "start at event number X" to job-info. In order to start at event number X, I'd have to re-read the whole eventlog from the start again.

Sorry, I probably conflated the two thoughts when I typed that!

Yeah, although I think you mean credits = infinity w/ resume and credits = 0 w/ pause.

Er yeah, was thinking of the reverse I guess where sender starts out paused (zero credits) and has to get a "resume" (credit++) before sending each response, so the credit count oscillates between 0 and 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants