Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Visibility duplication protection fails when using S3 source for large files and receiving 10 messages from SQS queue #4812

Closed
danhli opened this issue Aug 7, 2024 · 0 comments · Fixed by #4831
Assignees
Labels
bug Something isn't working
Milestone

Comments

@danhli
Copy link
Contributor

danhli commented Aug 7, 2024

Describe the bug
When ingesting large files using S3-SQS processing, OpenSearch Ingestion pipeline failed to prevent duplication although visibility_duplication_protection was set to true. The SQS queue's ApproximateNumberOfMessagesVisible metric kept growing. Confirmed duplicated documents existed in the OpenSearch index. There were many errors in the pipeline logs. For example:
ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Failed to set visibility timeout for message [foo] to 60
software.amazon.awssdk.services.sqs.model.SqsException: Value [bar] for parameter ReceiptHandle is invalid. Reason: Message does not exist or is not available for visibility timeout change. (Service: Sqs, Status Code: 400, Request ID: [baz])

The issue happened when OSI received 10 messages (or close to 10) from the SQS queue in a single request. It worked fine when OSI received one message in a single request.

To Reproduce
Steps to reproduce the behavior:

  1. Configure a S3 bucket and a SQS queue according to the documentation
  2. Set the following parameters in the pipeline configuration
    source:
    s3:
    acknowledgments: true
    notification_type: "sqs"
    compression: "gzip"
    codec:
    ndjson:
    workers: 5
    sqs:
    queue_url:
    maximum_messages: 10
    visibility_timeout: "60s"
    visibility_duplication_protection: true
  3. Provision the right amount of OCUs for the pipeline
  4. Start the pipeline
  5. Send large gzipped files, e.g. 80MB, to the S3 bucket continuously
  6. Check the pipeline's logs using CloudWatch Log Insights with the following query
    fields @timestamp, @message
    | filter @message like /ReceiptHandle is invalid/
    | sort @timestamp desc
  7. Check the pipeline's logs using CloudWatch Log Insights with the following query
    fields @timestamp, @message
    | filter @message like /10 messages from SQS. Processing/
    | sort @timestamp desc
  8. Check the queue's ApproximateNumberOfMessagesVisible metric
  9. Check OpenSearch index for duplicated documents

Expected behavior
No log messages are found from step 6. Log messages are found from step 7. The ApproximateNumberOfMessagesVisible metric doesn't keep growing. There are no duplicated documents in the index.

@danhli danhli added bug Something isn't working untriaged labels Aug 7, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Aug 13, 2024
@dlvenable dlvenable added this to the v2.9 milestone Aug 19, 2024
danhli added a commit to danhli/data-prepper that referenced this issue Aug 19, 2024
dlvenable pushed a commit that referenced this issue Aug 23, 2024
Fix visibility timeout errors (#4812)

Signed-off-by: Daniel Li <[email protected]>
opensearch-trigger-bot bot pushed a commit that referenced this issue Aug 23, 2024
Fix visibility timeout errors (#4812)

Signed-off-by: Daniel Li <[email protected]>
(cherry picked from commit 910533a)
dlvenable pushed a commit that referenced this issue Aug 23, 2024
Fix visibility timeout errors (#4812)

Signed-off-by: Daniel Li <[email protected]>
(cherry picked from commit 910533a)

Co-authored-by: Daniel Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants