-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Messages getting dropped unintentionally on kafka sink #22026
Comments
Thanks @heshanperera-alert . It does seem like this error should be retried. I think the change would need to be somewhere in here if anyone feels motivated: vector/src/sinks/kafka/service.rs Lines 147 to 199 in f73fb10
|
So would a simple fix be to shove the diff --git a/src/sinks/kafka/service.rs b/src/sinks/kafka/service.rs
index 087795864..4772833e9 100644
--- a/src/sinks/kafka/service.rs
+++ b/src/sinks/kafka/service.rs
@@ -161,7 +161,7 @@ impl Service<KafkaRequest> for KafkaService {
}
// Producer queue is full.
Err((
- KafkaError::MessageProduction(RDKafkaErrorCode::QueueFull),
+ KafkaError::MessageProduction(RDKafkaErrorCode::QueueFull | RDKafkaErrorCode::PolicyViolation),
original_record,
)) => {
if blocked_state.is_none() {
I'm not entirely certain how that chunk of code works, it appears to just delay by a bit and retry the request after 100ms if I'm understanding it right. |
Yeah, I think that would work. And yes, it just waits 100ms and retries. |
Problem: Some messages were getting dropped by Vector due to Kafka throwing `PolicyViolation` errors. These should be retried as a policy can be as simple as a more aggressive rate limit. ---------- Solution: Retry any messages that had the `RDKafkaErrorCode::PolicyViolation` error. ---------- Note: A dynamic back off may be better, as there may be a rate limit out there that still needs more than 100ms to back off on requests. ---------- See the original issue at vectordotdev#22026 Closes vectordotdev#22026
Problem: Some messages were getting dropped by Vector due to Kafka throwing `PolicyViolation` errors. These should be retried as a policy can be as simple as a more aggressive rate limit. ---------- Solution: Retry any messages that had the `RDKafkaErrorCode::PolicyViolation` error. ---------- Note: A dynamic back off may be better, as there may be a rate limit out there that still needs more than 100ms to back off on requests. ---------- See the original issue at vectordotdev#22026 Closes vectordotdev#22026
Problem: Some messages were getting dropped by Vector due to Kafka throwing `PolicyViolation` errors. These should be retried as a policy can be as simple as a more aggressive rate limit. ---------- Solution: Retry any messages that had the `RDKafkaErrorCode::PolicyViolation` error. ---------- Note: A dynamic back off may be better, as there may be a rate limit out there that still needs more than 100ms to back off on requests. ---------- See the original issue at vectordotdev#22026 Closes vectordotdev#22026
Problem: Some messages were getting dropped by Vector due to Kafka throwing `PolicyViolation` errors. These should be retried as a policy can be as simple as a more aggressive rate limit. ---------- Solution: Retry any messages that had the `RDKafkaErrorCode::PolicyViolation` error. ---------- Note: A dynamic back off may be better, as there may be a rate limit out there that still needs more than 100ms to back off on requests. ---------- See the original issue at vectordotdev#22026 Closes vectordotdev#22026
#22041) * fix(kafka sink): retry messages that result in kafka policy violations Problem: Some messages were getting dropped by Vector due to Kafka throwing `PolicyViolation` errors. These should be retried as a policy can be as simple as a more aggressive rate limit. ---------- Solution: Retry any messages that had the `RDKafkaErrorCode::PolicyViolation` error. ---------- Note: A dynamic back off may be better, as there may be a rate limit out there that still needs more than 100ms to back off on requests. ---------- See the original issue at #22026 Closes #22026 * Update changelog.d/22026-retry-kafka-policy-violations.fix.md --------- Co-authored-by: Jesse Szwedko <[email protected]>
A note for the community
Problem
I am using Kafka sink to send messages to azure eventhub. I have noted that messages gets dropped when the azure event-hub hits the throttling limits. When I checked the vector's internal metrics, i can see that these messages gets dropped unintenionally. Is there a way to determine whether these events are retried automatically by vector?
I am going through the documentation on Kafka sink, but there doesn't seem to have a way or a flag to retry these events upon failures
Version
0.37.1
Debug Output
vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=Some(KafkaError (Message production error: PolicyViolation (Broker: Policy violation)))
The text was updated successfully, but these errors were encountered: