Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add an efficient labels serializer #474

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `metrics.cfg{}` `"all"` metasection for array `include` and `exclude`
(`metrics.cfg{include={'all'}}` can be used instead of `metrics.cfg{include='all'}`,
`metrics.cfg{exclude={'all'}}` can be used instead of `metrics.cfg{include='none'}`)
- Allow users to decide on labels serialization scheme themselves. Add `labels_serializer`, which provides alternative efficient labels serialization scheme.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Allow users to decide on labels serialization scheme themselves. Add `labels_serializer`, which provides alternative efficient labels serialization scheme.
- Allow users to customize labels serialization with `metrics.labels_serializer` API.


## [1.0.0] - 2023-05-22
### Changed
Expand Down
3 changes: 3 additions & 0 deletions metrics/api.lua
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ local Gauge = require('metrics.collectors.gauge')
local Histogram = require('metrics.collectors.histogram')
local Summary = require('metrics.collectors.summary')

local serializers = require("metrics.serializers")

local registry = rawget(_G, '__metrics_registry')
if not registry then
registry = Registry.new()
Expand Down Expand Up @@ -140,4 +142,5 @@ return {
unregister_callback = unregister_callback,
invoke_callbacks = invoke_callbacks,
set_global_labels = set_global_labels,
labels_serializer = serializers.basic_labels_serializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
labels_serializer = serializers.basic_labels_serializer
labels_serializer = serializers.basic_labels_serializer,

}
10 changes: 5 additions & 5 deletions metrics/collectors/shared.lua
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
local clock = require('clock')
local fiber = require('fiber')
local log = require('log')
local serializers = require("metrics.serializers")

local Shared = {}

Expand Down Expand Up @@ -47,12 +48,11 @@ function Shared.make_key(label_pairs)
if type(label_pairs) ~= 'table' then
return ""
end
local parts = {}
for k, v in pairs(label_pairs) do
table.insert(parts, k .. '\t' .. v)
-- `label_pairs` provides its own serialization scheme, it must be used instead of default one.
if label_pairs.__metrics_serialize then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not a metamethod -- it's just some internal API (even though object is internal by itself). Since it is not a metamethod, I think you shouldn't use double underscore in the name: use _metrics_serialize (or simply serialize even).

return label_pairs:__metrics_serialize()
end
table.sort(parts)
return table.concat(parts, '\t')
return serializers.default_labels_serializer(label_pairs)
end

function Shared:remove(label_pairs)
Expand Down
6 changes: 4 additions & 2 deletions metrics/http_middleware.lua
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ function export.configure_default_collector(...)
export.set_default_collector(export.build_default_collector(...))
end

local labels_serializer = metrics_api.labels_serializer({ "path", "method", "status" })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would likely break using the global labels.


--- Measure latency and invoke collector with labels from given route
--
-- @tab collector
Expand All @@ -86,11 +88,11 @@ function export.observe(collector, route, handler, ...)
error(('incorrect http handler for %s %s: expecting return response object'):
format(route.method, route.path), 0)
end
return {
return labels_serializer.wrap({
path = route.path,
method = route.method,
status = (not ok and 500) or result.status or 200,
}
})
end, handler, ...)
end

Expand Down
1 change: 1 addition & 0 deletions metrics/init.lua
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ return setmetatable({
unregister_callback = api.unregister_callback,
invoke_callbacks = api.invoke_callbacks,
set_global_labels = api.set_global_labels,
labels_serializer = api.labels_serializer,
enable_default_metrics = tarantool.enable,
cfg = cfg.cfg,
http_middleware = http_middleware,
Expand Down
86 changes: 86 additions & 0 deletions metrics/serializers.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
--- Default slow algorithm, polluted with sorting.
--- It is used when nothing is known about `label_pairs`.
local function default_labels_serializer(label_pairs)
local parts = {}
for k, v in pairs(label_pairs) do
table.insert(parts, k .. '\t' .. v)
end
table.sort(parts)
return table.concat(parts, '\t')
end


--- Prepares a serializer for label pairs with given keys.
---
--- `make_key`, which is used during every metric-related operation, is not very efficient itself.
--- To mitigate it, one could add his own serialization implementation.
--- It is done via passing `__metrics_serialize` callback to the label pairs table.
---
--- This function gives you ready-to-use serializer, so you don't have to create one yourself.
---
--- BEWARE! If keys of the `label_pairs` somehow change between serialization turns, it would raise error mostlikely.
--- We cover internal cases already, for example, "le" key is always added for the histograms.
---
--- @class LabelsSerializer
--- @field wrap function(label_pairs: table): table Wraps given `label_pairs` with an efficient serialization.
--- @field serialize function(label_pairs: table): string Serialize given `label_pairs` into the key.
--- Exposed so you can write your own serializers on top of it.
---
--- @param labels_keys string[] Label keys for the further use.
--- @return LabelsSerializer
local function basic_labels_serializer(labels_keys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming is confusing here: default_labels_serializer is an actual serializer and basic_labels_serializer is a serializer constructor/builder/factory. Please, use name like build_labels_serializer or similar.

-- we always add keys that are needed for metrics' internals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it is not internals -- it is histogram labeling approach (based on Prometheus design).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have similar label -- quantile -- for summary collectors, which are not covered here, but should be.

local __labels_keys = { "le" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems non-optimized: we will have a key that works only for a histogram. Maybe it's worth to have a different factories for counter/gauge, histogram and summary?

-- used to protect label_pairs from altering with unexpected keys.
local keys_index = { le = true }

-- keep only unique labels
for _, key in ipairs(labels_keys) do
if not keys_index[key] then
table.insert(__labels_keys, key)
keys_index[key] = true
end
end
table.sort(__labels_keys)

local function serialize(label_pairs)
local result = ""
for _, label in ipairs(__labels_keys) do
local value = label_pairs[label]
if value ~= nil then
if result ~= "" then
result = result .. '\t'
end
result = result .. label .. '\t' .. value
end
end
return result
end

local pairs_metatable = {
__index = {
__metrics_serialize = function(self)
return serialize(self)
end
},
-- It protects pairs from being altered with unexpected labels.
__newindex = function(table, key, value)
if not keys_index[key] then
error(('Label "%s" is unexpected'):format(key), 2)
end
rawset(table, key, value)
end
}

return {
wrap = function(label_pairs)
return setmetatable(label_pairs, pairs_metatable)
end,
serialize = serialize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serialize = serialize
serialize = serialize,

}
end

return {
default_labels_serializer = default_labels_serializer,
basic_labels_serializer = basic_labels_serializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
basic_labels_serializer = basic_labels_serializer
basic_labels_serializer = basic_labels_serializer,

}
38 changes: 38 additions & 0 deletions test/metrics_test.lua
Original file line number Diff line number Diff line change
Expand Up @@ -269,3 +269,41 @@ g.test_deprecated_version = function(cg)
"use require%('metrics'%)._VERSION instead."
t.assert_not_equals(cg.server:grep_log(warn), nil)
end

g.test_labels_serializer_consistent = function()
local shared = require("metrics.collectors.shared")

local label_pairs = {
abc = 123,
cba = "123",
cda = 0,
eda = -1,
acb = 456
}
local label_keys = {}
for key, _ in pairs(label_pairs) do
table.insert(label_keys, key)
end

local serializer = metrics.labels_serializer(label_keys)
local actual = serializer.serialize(label_pairs)
local wrapped = serializer.wrap(table.copy(label_pairs))

t.assert_equals(actual, shared.make_key(label_pairs))
t.assert_equals(actual, shared.make_key(wrapped))
t.assert_not_equals(wrapped.__metrics_serialize, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a unit test, so it's better to move it to somewhere else (maybe a new file). We also need to have an integration test with a full example:

  • build a custom serializer,
  • set it through public API,
  • observe some values,
  • collect the metrics.

It also would be nice ti write an example (at least in tests) for a custom serializer considering default metrics, especially since they are enabled by default in Tarantool 2.11.1 and newer. As a reference, you may refer to Telegraf labels setup here: https://www.tarantool.io/en/doc/latest/book/monitoring/grafana_dashboard/#prepare-a-monitoring-stack (code block starting with [[inputs.http]]).

We should also do an integration test with global labels setup (AFAIU, one should cover it in serializer too).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it seems that I have misunderstood the general approach: this PR introduces a serializer for each label pairs instead of replacing a serializer for each metrics. In that case, default metrics won't be covered. On the other hand, the question about global labels still needs to be answered.


-- trying to set unexpected label.
t.assert_error_msg_contains('Label "new_label" is unexpected', function() wrapped.new_label = "123456" end)

-- check that builtin metrics work.
local hist_serializer = metrics.labels_serializer(label_keys)
local hist = metrics.histogram('hist', 'test histogram', {2})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, test with all collector types: counter, gauge, histogram and summary (shared is an internal implementation base class, so it may be ignored here).


hist:observe(3, hist_serializer.wrap(table.copy(label_pairs)))
local state = table.deepcopy(hist)
hist:observe(3, label_pairs)

t.assert_equals(hist.observations, state.observations)
t.assert_equals(hist.label_pairs, state.label_pairs)
end