Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an efficient labels serializer #474

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ftelnov
Copy link

@ftelnov ftelnov commented Nov 3, 2023

What has been done? Why? What problem is being solved?

I found out that observe method works slow. Further investigations showed that make_key function sorts keys every run time. make_key is being called often during, for instance, hist:observe. To optimize this case and both keep the default behavior, I propose two things:

  1. Allow user to override labels_pairs serialization scheme with __metrics_make_key meta-method. If it is supplied, it would be used, otherwise the default sorting scheme would be used;
  2. Provide precaching labels serializer. It would order the labels properly on the client side, and only concat pairs on the metrics side. The complexity is linear then.

I didn't forget about

  • Tests
  • Changelog
  • Documentation (README and rst) (don't think I should alter it for such a small api?)
  • Rockspec and rpm spec (I don't really know what to add here)

Copy link

github-actions bot commented Jan 2, 2024

This pull request is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days

Copy link
Member

@DifferentialOrange DifferentialOrange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR! I have several question regarding implementation, testing and documentation. My main concern is how it will work with global labels -- for now it seems that one would need to explicilty support them in serializer, which, for example, will break the usage HTTP middleware after your change.

Since we are talking about efficiency here, it is also crucial to provide some performance comparison here. For example, I had used the following script to check your updates performance:

local clock = require('clock')

local metrics = require('metrics')

metrics.cfg{include = 'none', labels = {alias = 'myalias'}}

local custom_serializer = metrics.labels_serializer({'alias', 'l1', 'l2'})

local collector_with_old_serialization = metrics.counter('collector_with_old_serialization')
local bench_old = clock.bench(function()
    for i = 1, 1e6 do
        collector_with_old_serialization:inc(1, {l1 = 'v1', l2 = 'v' .. tostring(i)})
    end
end)

local collector_with_new_serialization = metrics.counter('collector_with_new_serialization')
local bench_new = clock.bench(function()
    for i = 1, 1e6 do
        collector_with_new_serialization:inc(1, custom_serializer.wrap{l1 = 'v1', l2 = 'v' .. tostring(i)})
    end
end)

print(('old: %.4f s, new: %.4f s'):format(bench_old[1], bench_new[1]))
old: 3.6006 s, new: 3.0873 s

@@ -140,4 +142,5 @@ return {
unregister_callback = unregister_callback,
invoke_callbacks = invoke_callbacks,
set_global_labels = set_global_labels,
labels_serializer = serializers.basic_labels_serializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
labels_serializer = serializers.basic_labels_serializer
labels_serializer = serializers.basic_labels_serializer,

metrics/api.lua Outdated
@@ -125,6 +125,68 @@ local function set_global_labels(label_pairs)
registry:set_labels(label_pairs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, squash your commits into the single one with a name like api: support custom labels serialization

@@ -48,15 +49,10 @@ function Shared.make_key(label_pairs)
return ""
end
-- `label_pairs` provides its own serialization scheme, it must be used instead of default one.
if label_pairs.__metrics_make_key then
return label_pairs:__metrics_make_key()
if label_pairs.__metrics_serialize then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not a metamethod -- it's just some internal API (even though object is internal by itself). Since it is not a metamethod, I think you shouldn't use double underscore in the name: use _metrics_serialize (or simply serialize even).

@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `metrics.cfg{}` `"all"` metasection for array `include` and `exclude`
(`metrics.cfg{include={'all'}}` can be used instead of `metrics.cfg{include='all'}`,
`metrics.cfg{exclude={'all'}}` can be used instead of `metrics.cfg{include='none'}`)
- Allow users to decide on labels serialization scheme themselves. Add `labels_serializer`, which provides alternative efficient labels serialization scheme.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Allow users to decide on labels serialization scheme themselves. Add `labels_serializer`, which provides alternative efficient labels serialization scheme.
- Allow users to customize labels serialization with `metrics.labels_serializer` API.

--- @param labels_keys string[] Label keys for the further use.
--- @return LabelsSerializer
local function basic_labels_serializer(labels_keys)
-- we always add keys that are needed for metrics' internals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is not internals -- it is histogram labeling approach (based on Prometheus design).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have similar label -- quantile -- for summary collectors, which are not covered here, but should be.

---
--- @param labels_keys string[] Label keys for the further use.
--- @return LabelsSerializer
local function basic_labels_serializer(labels_keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is confusing here: default_labels_serializer is an actual serializer and basic_labels_serializer is a serializer constructor/builder/factory. Please, use name like build_labels_serializer or similar.


-- check that builtin metrics work.
local hist_serializer = metrics.labels_serializer(label_keys)
local hist = metrics.histogram('hist', 'test histogram', {2})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, test with all collector types: counter, gauge, histogram and summary (shared is an internal implementation base class, so it may be ignored here).


t.assert_equals(actual, shared.make_key(label_pairs))
t.assert_equals(actual, shared.make_key(wrapped))
t.assert_not_equals(wrapped.__metrics_serialize, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a unit test, so it's better to move it to somewhere else (maybe a new file). We also need to have an integration test with a full example:

  • build a custom serializer,
  • set it through public API,
  • observe some values,
  • collect the metrics.

It also would be nice ti write an example (at least in tests) for a custom serializer considering default metrics, especially since they are enabled by default in Tarantool 2.11.1 and newer. As a reference, you may refer to Telegraf labels setup here: https://www.tarantool.io/en/doc/latest/book/monitoring/grafana_dashboard/#prepare-a-monitoring-stack (code block starting with [[inputs.http]]).

We should also do an integration test with global labels setup (AFAIU, one should cover it in serializer too).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it seems that I have misunderstood the general approach: this PR introduces a serializer for each label pairs instead of replacing a serializer for each metrics. In that case, default metrics won't be covered. On the other hand, the question about global labels still needs to be answered.

@@ -69,6 +69,8 @@ function export.configure_default_collector(...)
export.set_default_collector(export.build_default_collector(...))
end

local labels_serializer = metrics_api.labels_serializer({ "path", "method", "status" })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would likely break using the global labels.

--- @return LabelsSerializer
local function basic_labels_serializer(labels_keys)
-- we always add keys that are needed for metrics' internals.
local __labels_keys = { "le" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems non-optimized: we will have a key that works only for a histogram. Maybe it's worth to have a different factories for counter/gauge, histogram and summary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants