S3 sink does not honour batch config #18692

sinzui · 2023-09-27T15:26:27Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We see vector (as an agent) is uploading hundreds of tiny files to S3,
but we expect files to be much larger. We want fewer files to optimized with
SQS's 10-message receive limit.

Looking at one of the files, we see it contains 16 messages. The uncompressed
file size is 29K. The events in the file happened over 0.242 seconds.
The count, size, and time does not conform to the configured settings and
https://vector.dev/docs/reference/configuration/sinks/aws_s3/#buffers-and-batches

In single hour an agent uploaded 2,319 files containing 8,863 messages which total 10,480,814 bytes.

Configuration

s3_sink:
        type: aws_s3
        inputs:
          - route_sinker.s3
        buffer:
          max_events: 5000
        batch:
          max_bytes: 40000000
          max_events: 20000
          timeout_secs: 300.0
        bucket: REDACTED
        compression: gzip
        content_encoding: gzip
        content_type: application/gzip
        encoding:
          codec: json
        framing:
          method: newline_delimited
        filename_extension: log.gz
        filename_append_uuid: true
        healthcheck:
          enabled: true
        key_prefix: "vector/%F/{{labels.cluster_id}}/{{kubernetes.node.name}}/%s_"
        region: us-east-1
        storage_class: INTELLIGENT_TIERING

Version

0.32.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

The text was updated successfully, but these errors were encountered:

jszwedko · 2023-09-27T16:03:57Z

Hi @sinzui !

To confirm, are you seeing multiple files created for the same keey prefix? I'm noticing your key prefix includes %s which will partition batches by second in addition to cluster id and node name.

sinzui · 2023-09-27T16:39:11Z

Thank you very much @jszwedko! Removing the epoch does fix the issue. Your explanation make sense.

Sigh. I thought I tested that when I was exploring filenames by Kubernetes namespace.

sinzui added the type: bug A code related bug. label Sep 27, 2023

sinzui closed this as completed Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3 sink does not honour batch config #18692

S3 sink does not honour batch config #18692

sinzui commented Sep 27, 2023

jszwedko commented Sep 27, 2023

sinzui commented Sep 27, 2023

S3 sink does not honour batch config #18692

S3 sink does not honour batch config #18692

Comments

sinzui commented Sep 27, 2023

A note for the community

Problem

Configuration

Version

Debug Output

Example Data

Additional Context

References

jszwedko commented Sep 27, 2023

sinzui commented Sep 27, 2023