-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(specs): Add specification for partial-write errors #16034
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just have one question and a small fix
To do so, the error must contain a list of successfully | ||
written metrics, which must be marked as __accepted__ and must be removed from | ||
the buffer. The error must contain a list of metrics fatally failed to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should accepted metrics be guaranteed to be in the error? Or could they be inferred based on any errored metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate your help with the formulation as mine is ambiguous and overly complex I feel. What I want to say is that we get the list of metrics that are accepted (aka. can be dropped from the buffer) as well as the list of metrics rejected e.g. due to serialization errors or similar (aka. can be dropped from the buffer). Now all metrics in the batch not belonging to one of the mentioned lists should be kept in the buffer and re-issued for writing with the next batch!
So either the error explicitly provides them (which we might want in the future) or we need to infer the metrics to keep from not being in one of the other lists...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think relying on the error explicitly providing accepted metrics specifically is that its not something we can rely on since if all are accepted there would be no error. I think it makes the most sense to have the error only give information about which metrics had an error (retryable or otherwise) and if a metric isn't mentioned in the error it can be assumed as accepted. Does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I would take another view: All metrics that should be dropped from the buffer should be mentioned in the error so we do have an implicit failback of "what's not mentioned should be kept" which is the safe spot IMO. That's what currently is done, all metrics "accepted" and all metrics "rejected" are in there, so everything else should be kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes a lot of sense in a code flow perspective, I think my only hangup is with this just being called an "error," when the error describes metrics that are not errored but accepted properly (and in many cases, may only contain accepted metrics). Maybe it makes sense for this not to be an error but some other explicit return type that could contain an error field? But if you don't have an issue with this then I'm fine with it
Co-authored-by: Dane Strandboge <[email protected]>
Summary
Add specification for handling partial-write errors on outputs, defining the behavior and error content
Checklist
Related issues
related to #11942
related to #14802
related to #15908
related to #15742