You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The persistent queue removes items from storage after they're successfully exported. This removal happens in a transaction which also updates the list of currently dispatched items. Depending on the implementation details of the underlying storage, this transaction may fail if the storage device is full.
As a result, we can take items out of the queue, but they're not actually removed from the storage, and no new items can be put in.
Steps to reproduce
See the unit test in the linked PR.
Additional context
I've confirmed that filestorage can behave this way via the following test: open-telemetry/opentelemetry-collector-contrib@dbe3105. I suspect that this will be true of any transactional storage engine, as some amount of transaction data needs to be persisted to disk before it can be committed.
How often this can happen in practice is difficult to estimate. It depends heavily on how the size of queue items aligns with available disk space. Anecdotally, I've seen it happen during an incident, on a volume with multiple queues sharing space.
The text was updated successfully, but these errors were encountered:
Describe the bug
The persistent queue removes items from storage after they're successfully exported. This removal happens in a transaction which also updates the list of currently dispatched items. Depending on the implementation details of the underlying storage, this transaction may fail if the storage device is full.
As a result, we can take items out of the queue, but they're not actually removed from the storage, and no new items can be put in.
Steps to reproduce
See the unit test in the linked PR.
Additional context
I've confirmed that filestorage can behave this way via the following test: open-telemetry/opentelemetry-collector-contrib@dbe3105. I suspect that this will be true of any transactional storage engine, as some amount of transaction data needs to be persisted to disk before it can be committed.
How often this can happen in practice is difficult to estimate. It depends heavily on how the size of queue items aligns with available disk space. Anecdotally, I've seen it happen during an incident, on a volume with multiple queues sharing space.
The text was updated successfully, but these errors were encountered: