Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamically changing the visibility timeout for S3 Source with SQS queue #2485

Closed
graytaylor0 opened this issue Apr 13, 2023 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@graytaylor0
Copy link
Member

Is your feature request related to a problem? Please describe.
As a user of the s3 source with sqs, I have messages/objects that differentiates in size. This means that there is not an optimal visibility timeout for the SQS queue, as too small a timeout cause issues with large messages, and too large a timeout could cause delays on processing files if data prepper were to crash.

Describe the solution you'd like
Making timely calls to the ChangeMessageVisbility API of SQS from the S3 source. This could be an optional parameter for the sqs queue.

source:
  s3:
    sqs:
      visibility_timeout: "dynamic"

The S3 source would be responsible for keeping track of the time that it has been processing a message, and would make an API call if it couldn't process the message in time. For example, if the visibility timeout of the queue is 2 minutes, and the S3 source pulls this message, and finds it won't be able to process it in time, an API call to ChangeMessageVisbility would be made to increase the visibility timeout for the message by another 2 minutes. This would continue until the message is fully processed, or until the instance of Data Prepper crashes, which means the visibility timeout would not be increased again, and another instance of data prepper could grab the message as intended.

Describe alternatives you've considered (Optional)
Defaulting the visibility timeout to a much larger value (maybe even the max of 12 hr), and then if Data Prepper is going to shutdown, to call a ChangeMessageVisibility with a value of 0 to allow another instance of Data Prepper to immediately

@dlvenable
Copy link
Member

Resolved by #3565

@dlvenable
Copy link
Member

dlvenable commented Nov 8, 2023

The configuration we implemented is as follows:

source:
  s3:
    sqs:
      visibility_duplication_protection: true
      visibility_duplicate_protection_timeout: 1h

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants