docs(specs): Add specification for partial-write errors #16034

srebhan · 2024-10-16T16:20:49Z

Summary

Add specification for handling partial-write errors on outputs, defining the behavior and error content

Checklist

No AI generated code was used in this PR

Related issues

related to #11942
related to #14802
related to #15908
related to #15742

DStrand1

Looks great! Just have one question and a small fix

docs/specs/tsd-008-partial-write-error-handling.md

DStrand1 · 2024-10-16T22:07:05Z

docs/specs/tsd-008-partial-write-error-handling.md

+To do so, the error must contain a list of successfully
+written metrics, which must be marked as __accepted__ and must be removed from
+the buffer. The error must contain a list of metrics fatally failed to be


Should accepted metrics be guaranteed to be in the error? Or could they be inferred based on any errored metrics?

I would appreciate your help with the formulation as mine is ambiguous and overly complex I feel. What I want to say is that we get the list of metrics that are accepted (aka. can be dropped from the buffer) as well as the list of metrics rejected e.g. due to serialization errors or similar (aka. can be dropped from the buffer). Now all metrics in the batch not belonging to one of the mentioned lists should be kept in the buffer and re-issued for writing with the next batch!

So either the error explicitly provides them (which we might want in the future) or we need to infer the metrics to keep from not being in one of the other lists...

I think relying on the error explicitly providing accepted metrics specifically is that its not something we can rely on since if all are accepted there would be no error. I think it makes the most sense to have the error only give information about which metrics had an error (retryable or otherwise) and if a metric isn't mentioned in the error it can be assumed as accepted. Does that sound reasonable?

To be honest I would take another view: All metrics that should be dropped from the buffer should be mentioned in the error so we do have an implicit failback of "what's not mentioned should be kept" which is the safe spot IMO. That's what currently is done, all metrics "accepted" and all metrics "rejected" are in there, so everything else should be kept.

I think that makes a lot of sense in a code flow perspective, I think my only hangup is with this just being called an "error," when the error describes metrics that are not errored but accepted properly (and in many cases, may only contain accepted metrics). Maybe it makes sense for this not to be an error but some other explicit return type that could contain an error field? But if you don't have an issue with this then I'm fine with it

Co-authored-by: Dane Strandboge <[email protected]>

docs(specs): Add specification for partial-write errors

b1625db

telegraf-tiger bot added the docs Issues related to Telegraf documentation and configuration descriptions label Oct 16, 2024

srebhan added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Oct 16, 2024

srebhan assigned DStrand1 Oct 16, 2024

srebhan mentioned this pull request Oct 16, 2024

Prevent indefinite unsent metrics loop in outpus.azure_monitor #15908

Open

srebhan marked this pull request as draft October 16, 2024 18:35

DStrand1 reviewed Oct 16, 2024

View reviewed changes

Fix typo

c8adedd

Co-authored-by: Dane Strandboge <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(specs): Add specification for partial-write errors #16034

docs(specs): Add specification for partial-write errors #16034

srebhan commented Oct 16, 2024

DStrand1 left a comment

DStrand1 Oct 16, 2024

srebhan Oct 17, 2024

DStrand1 Oct 17, 2024

srebhan Oct 18, 2024

DStrand1 Oct 18, 2024

docs(specs): Add specification for partial-write errors #16034

Are you sure you want to change the base?

docs(specs): Add specification for partial-write errors #16034

Conversation

srebhan commented Oct 16, 2024

Summary

Checklist

Related issues

DStrand1 left a comment

Choose a reason for hiding this comment

DStrand1 Oct 16, 2024

Choose a reason for hiding this comment

srebhan Oct 17, 2024

Choose a reason for hiding this comment

DStrand1 Oct 17, 2024

Choose a reason for hiding this comment

srebhan Oct 18, 2024

Choose a reason for hiding this comment

DStrand1 Oct 18, 2024

Choose a reason for hiding this comment