-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compress_batches feature #1
Conversation
e5ddf42
to
e0c92bd
Compare
Closing and reopening to kick Travis into action |
This has been deprecated [since 2011][0] [0]: https://github.com/rubygems/rubygems/blob/1aebd7ddd69c9a38aa9daa3aa89f396d59c9e2a4/History.txt#L3296-L3297
Use the rubocop generated configuration to ignore any major changes that can't be auto-corrected.
9030e23
to
ad15de0
Compare
Ruby 2.4 is now end-of-life. As a result of this, the `google-protobuf` gem is no longer releasing artifacts compatible with this version. (protocolbuffers/protobuf#7453) In CI we encounter this error from `bundle install`: ``` google-protobuf-3.12.0-x86_64-linux requires ruby version >= 2.5, which is incompatible with the current version, ruby 2.4.6p354 ``` Therefore remove Ruby 2.4.6 as a tested version in Travis CI. Additionally remove the constraint on patch versions in the Travis config, so that we'll use the latest patch version available for each release branch.
ad15de0
to
c14ce6f
Compare
Note that in this PR I chose to use zlib, as it comes with the Ruby standard library and seems to do a pretty reasonable job (~85% compression on our lab logs). Maybe we'd like to investigate alternatives, like lz4, but I'd propose doing this in a follow-up, which will be possible by specifying a different value for the |
@pubsub = Google::Cloud::Pubsub.new project_id: project, credentials: key | ||
@autocreate_topic = autocreate_topic | ||
@topics = {} | ||
|
||
@compression_ratio = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is capturing the compression ratio. See comment where this is calculated.
) | ||
|
||
compressed_size = compressed_messages.bytesize | ||
@compression_ratio.observe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This to me looks like it is calculating the data-rate saving and not the ratio, which is what we want to do to ensue we can put this in known buckets of 0-1.
compression ratio = (original size)/(compressed size)
data-rate saving = 1 - (compressed size)/(uncompressed size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah you're totally right I think. I was a bit torn between producing a number which would make sense on a dashboard, Prometheus' best-practice definition of a percentage/ratio, and avoiding potential division by zero; and ended up with something slightly confusing here.
I think you're right that we probably do want to keep this formula, as if we computed the actual compression ratio then the value of that is harder to predict (e.g. 10MiB/1KiB = 1024) and therefore define buckets for.
So for what we should replace it with:
I'd argue that we're not talking about 'data rate' here, as we're operating on a single blob rather than an indefinite stream, so it should be expressed as 'space savings' right?
-
Current:
fluentd_output_gcloud_pubsub_messages_compression_ratio
Variable name:
@compression_ratio
-
Proposal 1:
fluentd_output_gcloud_pubsub_messages_compression_space_savings_ratio
Variable name:
@compression_space_saving
I'd like to keep
compression
in the name as it groups it with thecompression_duration_seconds
metric. -
Proposal 2:
fluentd_output_gcloud_pubsub_messages_compressed_size_per_original_size_ratio
Variable name:
@compression_ratio
??This is closer to the Prometheus suggestion, but feels a little bit awkward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That naming of Proposal 2 is really doing it for me 😆
Lets do that!
@@ -0,0 +1,14 @@ | |||
# frozen_string_literal: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something super weird has happened to your commits here. This file already exists with this content in it plus more. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ignore me. It's Github screwing up the ordering of the commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think this is probably GitHub doing its usual trick of showing commits in date order rather than actual commit order (as I moved some commits around in the history when rebasing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one comment on the naming of the compression ratio metrics but otherwise this looks super solid! Primo work dude.
If you get back to me about that I can smash approve button.
Augment the existing operations with Prometheus metrics in order to provide observability around the operations that the plugin is performing. Introduce a new metrics helper to prevent attempting to register the same metric more than once in a multi-threaded or multi-instance context.
6073941
to
8d7e0bc
Compare
As per the README updates, this can be used to compress a number of input records into a single Pub/Sub message, therefore saving on costs.
8d7e0bc
to
96f8528
Compare
This is best reviewed commit-by-commit, as there's quite a lot of noise in the diff due to Rubocop being introduced and auto-corrections applied.
f8d0755: Set a default ruby version
d5b9813: Remove deprecated has_rdoc option
This has been deprecated since 2011
d0e017e: Add rubocop configuration
3351fab: Add rubocop fixups and configuration
Use the rubocop generated configuration to ignore any major changes that
can't be auto-corrected.
ad03786: Adjust supported Ruby versions
Ruby 2.4 is now end-of-life. As a result of this, the
google-protobuf
gem is no longer releasing artifacts compatible with this version.
(protocolbuffers/protobuf#7453)
In CI we encounter this error from
bundle install
:Therefore remove Ruby 2.4.6 as a tested version in Travis CI.
Additionally remove the constraint on patch versions in the Travis
config, so that we'll use the latest patch version available for each
release branch.
265e819: Add Prometheus metrics
Augment the existing operations with Prometheus metrics in order to
provide observability around the operations that the plugin is
performing.
Introduce a new metrics helper to prevent attempting to register the
same metric more than once in a multi-threaded or multi-instance
context.
44b4b87: Add compress_batches feature
As per the README updates, this can be used to compress a number of
input records into a single Pub/Sub message, therefore saving on costs.