You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data Prepper's S3 source can loose connections from the AWS SDK connection pool. See the following error.
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. Unable to execute HTTP request: Timeout waiting for connection from pool
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: Unable to execute HTTP request: Timeout waiting for connection from pool. Retrying with exponential backoff.
After this happens, Data Prepper's s3 source is unable to reclaim the connection from the connection pool. Data Prepper can run for hours or days without getting this back.
To Reproduce
Steps to reproduce the behavior:
Configure an S3 bucket with SQS queue
Create a pipeline with an s3 source. Configure it to use automatic compression (using gzip would probably work too)
Upload a file which is uncompressed, but has the .gz extension not-valid-gzip.gz
Wait a while - maybe 30 minutes.
You see the error.
Upload a valid file. You will see the same error because the S3 AWS SDK is out of connections in the pool.
Expected behavior
Data Prepper will not reach this point of running out of connections.
Environment (please complete the following information):
Data Prepper 2.6.0
Additional context
Here are the logs from when the transition occurred. The GZIP encoding specified but data did contain gzip magic header log is expected since this is not a Gzip file.
2023-11-30T22:34:21.343 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:34:21.387 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: GZIP encoding specified but data did contain gzip magic header. Retrying with exponential backoff.
2023-11-30T22:34:21.387 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. GZIP encoding specified but data did contain gzip magic header
2023-11-30T22:34:21.387 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Pausing SQS processing for 21.3 seconds due to an error in processing.
2023-11-30T22:34:51.343 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=my-bucket, key=not-valid-gzip.gz]
2023-11-30T22:34:51.343 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. Unable to execute HTTP request: Timeout waiting for connection from pool
2023-11-30T22:35:53.601 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: Unable to execute HTTP request: Timeout waiting for connection from pool. Retrying with exponential backoff.
2023-11-30T22:35:53.601 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Pausing SQS processing for 18.477 seconds due to an error in processing.
2023-11-30T22:36:12.134 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=my-bucket, key=not-valid-gzip.gz]
2023-11-30T22:36:12.134 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:37:13.656 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: Unable to execute HTTP request: Timeout waiting for connection from pool. Retrying with exponential backoff.
Also, here are the logs from the first GZip failure. This shows the timestamps for the expected errors.
2023-11-30T22:09:51.178 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Received 1 messages from SQS. Processing 1 messages.
2023-11-30T22:09:51.179 [Thread-11] INFO org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Read S3 object: [bucketName=my-bucket, key=not-valid-gzip.gz]
2023-11-30T22:09:51.234 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.S3ObjectWorker - Error reading from S3 object: s3ObjectReference=[bucketName=my-bucket, key=not-valid-gzip.gz]. GZIP encoding specified but data did contain gzip magic header
2023-11-30T22:09:51.234 [Thread-11] ERROR org.opensearch.dataprepper.plugins.source.s3.SqsWorker - Error processing from S3: GZIP encoding specified but data did contain gzip magic header. Retrying with exponential backoff.
The text was updated successfully, but these errors were encountered:
Describe the bug
Data Prepper's S3 source can loose connections from the AWS SDK connection pool. See the following error.
After this happens, Data Prepper's
s3
source is unable to reclaim the connection from the connection pool. Data Prepper can run for hours or days without getting this back.To Reproduce
Steps to reproduce the behavior:
s3
source. Configure it to useautomatic
compression (usinggzip
would probably work too).gz
extensionnot-valid-gzip.gz
Expected behavior
Data Prepper will not reach this point of running out of connections.
Environment (please complete the following information):
Data Prepper 2.6.0
Additional context
Here are the logs from when the transition occurred. The
GZIP encoding specified but data did contain gzip magic header
log is expected since this is not a Gzip file.Also, here are the logs from the first GZip failure. This shows the timestamps for the expected errors.
The text was updated successfully, but these errors were encountered: