Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 sink does not honour batch config #18692

Closed
sinzui opened this issue Sep 27, 2023 · 2 comments
Closed

S3 sink does not honour batch config #18692

sinzui opened this issue Sep 27, 2023 · 2 comments
Labels
type: bug A code related bug.

Comments

@sinzui
Copy link

sinzui commented Sep 27, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We see vector (as an agent) is uploading hundreds of tiny files to S3,
but we expect files to be much larger. We want fewer files to optimized with
SQS's 10-message receive limit.

Looking at one of the files, we see it contains 16 messages. The uncompressed
file size is 29K. The events in the file happened over 0.242 seconds.
The count, size, and time does not conform to the configured settings and
https://vector.dev/docs/reference/configuration/sinks/aws_s3/#buffers-and-batches

In single hour an agent uploaded 2,319 files containing 8,863 messages which total 10,480,814 bytes.

Configuration

s3_sink:
        type: aws_s3
        inputs:
          - route_sinker.s3
        buffer:
          max_events: 5000
        batch:
          max_bytes: 40000000
          max_events: 20000
          timeout_secs: 300.0
        bucket: REDACTED
        compression: gzip
        content_encoding: gzip
        content_type: application/gzip
        encoding:
          codec: json
        framing:
          method: newline_delimited
        filename_extension: log.gz
        filename_append_uuid: true
        healthcheck:
          enabled: true
        key_prefix: "vector/%F/{{labels.cluster_id}}/{{kubernetes.node.name}}/%s_"
        region: us-east-1
        storage_class: INTELLIGENT_TIERING

Version

0.32.1

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@sinzui sinzui added the type: bug A code related bug. label Sep 27, 2023
@jszwedko
Copy link
Member

Hi @sinzui !

To confirm, are you seeing multiple files created for the same keey prefix? I'm noticing your key prefix includes %s which will partition batches by second in addition to cluster id and node name.

@sinzui
Copy link
Author

sinzui commented Sep 27, 2023

Thank you very much @jszwedko! Removing the epoch does fix the issue. Your explanation make sense.

Sigh. I thought I tested that when I was exploring filenames by Kubernetes namespace.

@sinzui sinzui closed this as completed Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants