Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Data Prepper is losing connections from S3 pool #3809

Closed
dlvenable opened this issue Dec 5, 2023 · 1 comment · Fixed by #3836
Closed

[BUG] Data Prepper is losing connections from S3 pool #3809

dlvenable opened this issue Dec 5, 2023 · 1 comment · Fixed by #3836
Assignees
Labels
bug Something isn't working
Milestone

Comments

@dlvenable
Copy link
Member

Describe the bug

Data Prepper's S3 source can loose connections from the AWS SDK connection pool. See the following error.

2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. Unable to execute HTTP request: Timeout waiting for connection from pool
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: Unable to execute HTTP request: Timeout waiting for connection from pool. Retrying with exponential backoff.

After this happens, Data Prepper's s3 source is unable to reclaim the connection from the connection pool. Data Prepper can run for hours or days without getting this back.

To Reproduce
Steps to reproduce the behavior:

  1. Configure an S3 bucket with SQS queue
  2. Create a pipeline with an s3 source. Configure it to use automatic compression (using gzip would probably work too)
  3. Upload a file which is uncompressed, but has the .gz extension not-valid-gzip.gz
  4. Wait a while - maybe 30 minutes.
  5. You see the error.
  6. Upload a valid file. You will see the same error because the S3 AWS SDK is out of connections in the pool.

Expected behavior

Data Prepper will not reach this point of running out of connections.

Environment (please complete the following information):

Data Prepper 2.6.0

Additional context

Here are the logs from when the transition occurred. The GZIP encoding specified but data did contain gzip magic header log is expected since this is not a Gzip file.

2023-11-30T22:34:21.343 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:34:21.387 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: GZIP encoding specified but data did contain gzip magic header. Retrying with exponential backoff.
2023-11-30T22:34:21.387 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. GZIP encoding specified but data did contain gzip magic header
2023-11-30T22:34:21.387 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Pausing SQS processing for 21.3 seconds due to an error in processing.
2023-11-30T22:34:51.343 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=my-bucket, key=not-valid-gzip.gz]
2023-11-30T22:34:51.343 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. Unable to execute HTTP request: Timeout waiting for connection from pool
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: Unable to execute HTTP request: Timeout waiting for connection from pool. Retrying with exponential backoff.
2023-11-30T22:35:53.601 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Pausing SQS processing for 18.477 seconds due to an error in processing.
2023-11-30T22:36:12.134 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=my-bucket, key=not-valid-gzip.gz]
2023-11-30T22:36:12.134 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:37:13.656 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: Unable to execute HTTP request: Timeout waiting for connection from pool. Retrying with exponential backoff.

Also, here are the logs from the first GZip failure. This shows the timestamps for the expected errors.

2023-11-30T22:09:51.178 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:09:51.179 [Thread-11] INFO  org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=my-bucket, key=not-valid-gzip.gz]
2023-11-30T22:09:51.234 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. GZIP encoding specified but data did contain gzip magic header
2023-11-30T22:09:51.234 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: GZIP encoding specified but data did contain gzip magic header. Retrying with exponential backoff.
@dlvenable
Copy link
Member Author

I found a similar issue in the AWS Java SDK (v1). One poster noted that there were S3Objects which were not closed.
aws/aws-sdk-java#1405 (comment)

Reviewing the code, I do not see any error handling that closes the stream. This seems the most likely cause to me currently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants