Skip to content

Commit

Permalink
Integrate CRUD statistics with metrics
Browse files Browse the repository at this point in the history
If `metrics` [1] found, metrics collectors are used to store
statistics. It is required to use `>= 0.5.0`, while at least `0.9.0`
is recommended to support age buckets in summary. The metrics are part
of global registry and can be exported together (e.g. to Prometheus)
with default tools without any additional configuration. Disabling
stats destroys the collectors.

If `metrics` found, `latency` statistics are changed to 0.99 quantile
of request execution time (with aging).

Add CI matrix to run tests with `metrics` installed. To get real
coverage result from coveralls, it is needed to merge different CI job
results. See more in #248.

1. https://github.com/tarantool/metrics

Closes #224
  • Loading branch information
DifferentialOrange committed Dec 7, 2021
1 parent b605917 commit c9d02f3
Show file tree
Hide file tree
Showing 6 changed files with 532 additions and 5 deletions.
12 changes: 11 additions & 1 deletion .github/workflows/test_on_push.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,19 @@ jobs:
matrix:
# We need 1.10.6 here to check that module works with
# old Tarantool versions that don't have "tuple-keydef"/"tuple-merger" support.
tarantool-version: ["1.10.6", "1.10", "2.2", "2.3", "2.4", "2.5", "2.6", "2.7"]
tarantool-version: ["1.10.6", "1.10", "2.2", "2.3", "2.4", "2.5", "2.6", "2.7", "2.8"]
metrics-version: [""]
remove-merger: [false]
include:
- tarantool-version: "2.7"
remove-merger: true
- tarantool-version: "2.8"
metrics-version: "0.1.8"
- tarantool-version: "2.8"
metrics-version: "0.9.0"
- tarantool-version: "2.8"
coveralls: true
metrics-version: "0.12.0"
fail-fast: false
runs-on: [ubuntu-latest]
steps:
Expand Down Expand Up @@ -47,6 +53,10 @@ jobs:
tarantool --version
./deps.sh
- name: Install metrics
if: matrix.metrics-version != ''
run: tarantoolctl rocks install metrics ${{ matrix.metrics-version }}

- name: Remove external merger if needed
if: ${{ matrix.remove-merger }}
run: rm .rocks/lib/tarantool/tuple/merger.so
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

### Added
* Statistics for CRUD operations on router (#224).
* Integrate CRUD statistics with `metrics` (#224).

### Changed

Expand Down
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -607,6 +607,11 @@ While statistics collection should not affect performance
in a noticeable way, you may disable it if you want to
prioritize performance.

If [`metrics`](https://github.com/tarantool/metrics) found,
metrics collectors are used to store statistics.
It is required to use version `0.9.0` or greater,
otherwise local collectors will be used.

Enabling stats on non-router instances is meaningless.

`crud.stats()` contains several sections: `insert` (for `insert` and `insert_object` calls),
Expand All @@ -631,9 +636,46 @@ crud.stats()['insert']
Each section contains different collectors for success calls
and error (both error throw and `nil, err`) returns. `count`
is total requests count since instance start or stats restart.
`latency` is average time of requests execution,
`latency` is 0.99 quantile of request execution time,
(if `metrics` not found, shows average instead).
`time` is total time of requests execution.

In `metrics` registry statistics are stored as `tnt_crud_stats` metrics
with `operation` and `status` label_pairs.
```
metrics:collect()
---
- - label_pairs:
status: ok
operation: insert
value: 221411
metric_name: tnt_crud_stats_count
- label_pairs:
status: ok
operation: insert
value: 10.49834896344692
metric_name: tnt_crud_stats_sum
- label_pairs:
status: ok
operation: insert
quantile: 0.5
value: 0.000003523699706
metric_name: tnt_crud_stats
- label_pairs:
status: ok
operation: insert
quantile: 0.9
value: 0.000006997063523
metric_name: tnt_crud_stats
- label_pairs:
status: ok
operation: insert
quantile: 0.99
value: 0.00023606420935973
metric_name: tnt_crud_stats
...
```

Additionally, `select` section contains `details` collectors.
```lua
crud.stats()['select']['details']
Expand All @@ -647,7 +689,10 @@ crud.stats()['select']['details']
(including those not executed successfully). `tuples_fetched`
is a count of tuples fetched from storages during execution,
`tuples_lookup` is a count of tuples looked up on storages
while collecting response for call.
while collecting response for call. In `metrics` registry they
are stored as `tnt_crud_map_reduces`, `tnt_crud_tuples_fetched`
and `tnt_crud_tuples_lookup` metrics with
`{ operation = 'select' }` label_pairs.

## Cartridge roles

Expand Down
226 changes: 226 additions & 0 deletions crud/stats/metrics_registry.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
local is_package, metrics = pcall(require, 'metrics')

local label = require('crud.stats.label')
local dev_checks = require('crud.common.dev_checks')
local registry_common = require('crud.stats.registry_common')

local registry = {}
local _registry = {}

local metric_name = {
-- Summary collector for all operations.
op = 'tnt_crud_stats',
-- `*_count` and `*_sum` are automatically created
-- by summary collector.
op_count = 'tnt_crud_stats_count',
op_sum = 'tnt_crud_stats_sum',

-- Counter collectors for select/pairs details.
tuples_fetched = 'tnt_crud_tuples_fetched',
tuples_lookup = 'tnt_crud_tuples_lookup',
map_reduces = 'tnt_crud_map_reduces',
}

local LATENCY_QUANTILE = 0.99

local DEFAULT_QUANTILES = {
[0.5] = 1e-5,
[0.9] = 1e-5,
[LATENCY_QUANTILE] = 1e-5,
}

local DEFAULT_SUMMARY_PARAMS = {
max_age_time = 60,
age_buckets_count = 5,
}

--- Check if application supports metrics rock for registry
--
-- `metrics >= 0.9.0` is required to use summary with
-- age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported
-- due to quantile overflow bug
-- (https://github.com/tarantool/metrics/issues/235).
--
-- @function is_supported
--
-- @treturn boolean Returns true if `metrics >= 0.9.0` found, false otherwise.
--
function registry.is_supported()
if is_package == false then
return false
end

-- Only metrics >= 0.9.0 supported.
local is_summary, summary = pcall(require, 'metrics.collectors.summary')
if is_summary == false or summary.rotate_age_buckets == nil then
return false
end

return true
end


--- Initialize collectors in global metrics registry
--
-- @function init
--
-- @treturn boolean Returns true.
--
function registry.init()
_registry[metric_name.op] = metrics.summary(
metric_name.op,
'CRUD router calls statistics',
DEFAULT_QUANTILES,
DEFAULT_SUMMARY_PARAMS)

_registry[metric_name.tuples_fetched] = metrics.counter(
metric_name.tuples_fetched,
'Tuples fetched from CRUD storages during select/pairs')

_registry[metric_name.tuples_lookup] = metrics.counter(
metric_name.tuples_lookup,
'Tuples looked up on CRUD storages while collecting response during select/pairs')

_registry[metric_name.map_reduces] = metrics.counter(
metric_name.map_reduces,
'Map reduces planned during CRUD select/pairs')

return true
end

--- Unregister collectors in global metrics registry
--
-- @function destroy
--
-- @treturn boolean Returns true.
--
function registry.destroy()
for _, c in pairs(_registry) do
metrics.registry:unregister(c)
end

_registry = {}
return true
end

--- Get copy of global metrics registry
--
-- @function get
--
-- @treturn table Returns copy of metrics registry.
function registry.get()
local stats = {}

if next(_registry) == nil then
return stats
end

-- Fill empty collectors with zero values.
for _, op_label in pairs(label) do
stats[op_label] = registry_common.build_collector(op_label)
end

for _, obs in ipairs(_registry[metric_name.op]:collect()) do
local operation = obs.label_pairs.operation
local status = obs.label_pairs.status
if obs.metric_name == metric_name.op then
if obs.label_pairs.quantile == LATENCY_QUANTILE then
stats[operation][status].latency = obs.value
end
elseif obs.metric_name == metric_name.op_sum then
stats[operation][status].time = obs.value
elseif obs.metric_name == metric_name.op_count then
stats[operation][status].count = obs.value
end
end

local _, obs_tuples_fetched = next(_registry[metric_name.tuples_fetched]:collect())
if obs_tuples_fetched ~= nil then
stats[label.SELECT].details.tuples_fetched = obs_tuples_fetched.value
end

local _, obs_tuples_lookup = next(_registry[metric_name.tuples_lookup]:collect())
if obs_tuples_lookup ~= nil then
stats[label.SELECT].details.tuples_lookup = obs_tuples_lookup.value
end

local _, obs_map_reduces = next(_registry[metric_name.map_reduces]:collect())
if obs_map_reduces ~= nil then
stats[label.SELECT].details.map_reduces = obs_map_reduces.value
end

return stats
end

--- Increase requests count and update latency info
--
-- @function observe
--
-- @tparam string op_label
-- Label of registry collectos.
-- Use `require('crud.common.const').OP` to pick one.
--
-- @tparam boolean success
-- true if no errors on execution, false otherwise.
--
-- @tparam number latency
-- Time of call execution.
--
-- @treturn boolean Returns true.
--
function registry.observe(op_label, success, latency)
dev_checks('string', 'boolean', 'number')

local label_pairs = { operation = op_label }
if success == true then
label_pairs.status = 'ok'
else
label_pairs.status = 'error'
end

_registry[metric_name.op]:observe(latency, label_pairs)

return true
end

--- Increase statistics of storage select/pairs calls
--
-- @function observe_fetch
--
-- @tparam number tuples_fetched
-- Count of tuples fetched during storage call.
--
-- @tparam number tuples_lookup
-- Count of tuples looked up on storages while collecting response.
--
-- @treturn boolean Returns true.
--
function registry.observe_fetch(tuples_fetched, tuples_lookup)
dev_checks('number', 'number')

local label_pairs = { operation = label.SELECT }

_registry[metric_name.tuples_fetched]:inc(tuples_fetched, label_pairs)
_registry[metric_name.tuples_lookup]:inc(tuples_lookup, label_pairs)
return true
end

--- Increase statistics of planned map reduces during select/pairs
--
-- @function observe_map_reduces
--
-- @tparam number count
-- Count of map reduces planned.
--
-- @treturn boolean Returns true.
--
function registry.observe_map_reduces(count)
dev_checks('number')

local label_pairs = { operation = label.SELECT }

_registry[metric_name.map_reduces]:inc(count, label_pairs)
return true
end

return registry
12 changes: 10 additions & 2 deletions crud/stats/module.lua
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,19 @@ local clock = require('clock')
local dev_checks = require('crud.common.dev_checks')
local utils = require('crud.common.utils')

local stats_registry = require('crud.stats.local_registry')

local stats = {}
local _is_enabled = false

local stats_registry
local local_registry = require('crud.stats.local_registry')
local metrics_registry = require('crud.stats.metrics_registry')

if metrics_registry.is_supported() then
stats_registry = metrics_registry
else
stats_registry = local_registry
end

--- Check if statistics module if enabled
--
-- @function is_enabled
Expand Down
Loading

0 comments on commit c9d02f3

Please sign in to comment.