-
Notifications
You must be signed in to change notification settings - Fork 773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP Exporter Retry and Persist Feature #4115
Comments
For retry, we need to consider the following cases:
|
Confirming the end result here:
All of the work planned for this issue stays as opt-in, correct? |
That is correct. This will be an opt-in feature until we have spec. |
Is there related spec work for these options in-progress already?
A retry policy is useful independently from persistent storage. That is, the gRPC client (or Polly if using HTTP) could be configured with a retry policy which can handle transient network errors. This handling would be opaque to the OTLP exporter. Do you plan to implement retry w/o also requiring the use of persistent storage? |
We should do this and is required by spec https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/exporter.md#retry Persistent storage is an optional, opt-in feature. |
Just wanted to share a thought. @alanwest kind of scratched at this on the SIG yesterday, I think. What does "persistent storage" mean? In the scope of this work, it seems to me we persist the data for retries. I'm working on an exporter for some client teams some (maybe all) of which seem to want "persistent storage" but of the always & up-front variety. Meaning exporter just writes to disk and then some other thread tries to ship off the disk data on its own cadence. Only sharing so we can be clear what kind of "persistent storage" we aim to support in OTLP and make sure the documentation is also clear 😄 |
Are there any plans still in motion regarding some kind of persistence support? Once https://opentelemetry.io/docs/specs/otel/library-guidelines/#protocol-exporters
This is more or less the scenario I would like to cater for, similar to what @CodeBlanch mentioned. Similar to how the collector has this supported https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#persistent-queue . Right now the issue that I experience is if create my own |
Am I correct that there is no retry by default? We would need to inject a retry policy by ourselves in the HttpClientFactory (if using Http) with e.g. Polly? If so we would need to catch the error codes described in the spec? tks |
True. See #1779 . Some PRs are in-flight now to make this happen automatically, so you don't have to manually deal with it. |
@cijothomas thanks. |
Yes. |
This issue was marked stale due to lack of activity and will be closed in 7 days. Commenting will instruct the bot to automatically remove the label. This bot runs once per day. |
not stale |
Feature Request
Is your feature request related to a problem?
When exporting data and a transient server issue prevents the request from being processed correctly the data will be lost. Or when a program is shutdown any data not yet exported before the process is terminated will be lost.
Describe the solution you'd like:
The data should be attempted to be exported again when the error is considered repeatable. On program shutdown data yet to be exported should attempt to do so after first saving to disk in case the transmission fails or does not have enough time to complete. Upon the next program execution the saved to disk telemetry will attempt to export. This will reduce the amount lost telemetry.
Additional Context
Add the ability to OTLP exporters to retry exports that fail in a defined way. This includes between program shutdowns by persisting the data to disk upon failure. This will help improve the reliability of OTel from the client's end.
Original GitHub Issue: #1278
The first set of PRs will focus on a single to be decided section in the following matrix and follow-up PRs will be enabling the others, reusing as much code as reasonable.
src/OpenTelemetry.Exporter.OpenTelemetryProtocol
PR Roadmap
FileBlobProvider
:Storage folder
section of Add persistent storage to exporter #1278persistentBlobProvider.TryCreateBlob(data, RetryAfter.Milliseconds, out var blob);
foreach (var blobItem in persistentBlobProvider.GetBlobs()) { ... }
blob.TryLease(1000); blob.TryRead(out var data);
blob.TryDelete();
FileBlobProvider
and the aboveRetry
scenario.Guard
s toFileBlobProvider
interface. RefRetryable Errors
Testing Strategy
Make use of the
test/OpenTelemetry.Exporter.OpenTelemetryProtocol.Tests/MockCollectorIntegrationTests.cs
class. Some example can be seen in this closed PR. Which the changes to that file made in that PR should be reusable.References
PersistentStorage
APIRetry
sectionThe text was updated successfully, but these errors were encountered: