Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarantool 3 "replacements" for Cartridge #491

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .luacheckrc
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ include_files = {"**/*.lua", "*.rockspec", "*.luacheckrc"}
exclude_files = {"lua_modules/", ".luarocks/", ".rocks/", "tmp/", ".history/"}

max_line_length = 120
max_comment_line_length = 200
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
### Added
- New Tarantool 3 metrics:
- tnt_config_alerts
- tnt_config_status

## [1.1.0] - 2024-05-17
### Added
Expand Down
1 change: 1 addition & 0 deletions doc/monitoring/api_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,7 @@ Metrics functions
* ``cartridge_failover``
* ``clock``
* ``event_loop``
* ``config``

See :ref:`metrics reference <metrics-reference>` for details.
All metric collectors from the collection have ``metainfo.default = true``.
Expand Down
37 changes: 37 additions & 0 deletions doc/monitoring/metrics_reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -993,3 +993,40 @@ Read view statistics

* - ``tnt_memtx_index_read_view``
- Memory (in bytes) held for read views.


Tarantool configuration
-----------------------

These metrics are available starting from Tarantool 3.0.

.. container:: table

.. list-table::
:widths: 25 75
:header-rows: 0

* - ``tnt_config_alerts``
- Count of current instance :ref:`configuration apply alerts <config_api_reference_info>`.
``{level="warn"}`` label covers warnings and
``{level="error"}`` covers errors.

* - ``tnt_config_status``
- The status of current instance :ref:`configuration apply <config_api_reference_info>`.
``status`` label contains possible status name.
Current status has metric value ``1``, inactive statuses have metric value ``0``.

.. code-block:: none

# HELP tnt_config_status Tarantool 3 configuration status
# TYPE tnt_config_status gauge
tnt_config_status{status="reload_in_progress",alias="router-001-a"} 0
tnt_config_status{status="uninitialized",alias="router-001-a"} 0
tnt_config_status{status="check_warnings",alias="router-001-a"} 0
tnt_config_status{status="ready",alias="router-001-a"} 1
tnt_config_status{status="check_errors",alias="router-001-a"} 0
tnt_config_status{status="startup_in_progress",alias="router-001-a"} 0

For example, this set of metrics means that current configuration
for ``router-001-a`` status is ``ready``.

1 change: 1 addition & 0 deletions metrics/tarantool.lua
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ local default_metrics = {
cartridge_failover = require('metrics.cartridge.failover'),
clock = require('metrics.tarantool.clock'),
event_loop = require('metrics.tarantool.event_loop'),
config = require('metrics.tarantool.config'),
}

local all_metrics_map = {}
Expand Down
76 changes: 76 additions & 0 deletions metrics/tarantool/config.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
local utils = require('metrics.utils')

local collectors_list = {}

local function get_config_alerts(config_info)
-- https://github.com/tarantool/tarantool/blob/319357d5973d15d08b8eda6a230eada08b710802/src/box/lua/config/utils/aboard.lua#L17-L18
local config_alerts = {
warn = 0,
error = 0,
}

for _, alert in pairs(config_info.alerts) do
config_alerts[alert.type] = config_alerts[alert.type] + 1
end

return config_alerts
end

local function get_config_status(config_info)
-- See state diagram here
-- https://github.com/tarantool/doc/issues/3544#issuecomment-1866033480
local config_status = {
uninitialized = 0,
startup_in_progress = 0,
reload_in_progress = 0,
check_warnings = 0,
check_errors = 0,
ready = 0,
}

config_status[config_info.status] = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to choose a unique numerical value for each status instead of passing the status name to labels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such approach is bad at scaling: in case some new status will be introduced, we will have two ways:

  • add a value in-between: very bad, will break existing dashboards;
  • add a value to the tail: is bad in case it's not one more "healthy" status (which is unlikely), since healthy "ready" status will always be visualized as something between "reload_in_progress" and "new_not_healthy_status".

Such metrics are also impossible to read without documentation. Similar approach is already discussed in [1, 2]. So I think this approach is a bit better, even though it's harder to visualize.

  1. Enum: custom label for state prometheus/client_python#416
  2. Does OpenTelemetry need "enum" metric type? open-telemetry/opentelemetry-specification#1711


return config_status
end

local function update()
if not utils.is_tarantool3() then
return
end

-- Can migrate to box.info().config later
-- https://github.com/tarantool/tarantool/commit/a1544d3bbc029c6fb2a148e580afe2b20e269b8d
local config = require('config')
local config_info = config:info()

local config_alerts = get_config_alerts(config_info)

for level, count in pairs(config_alerts) do
collectors_list.config_alerts = utils.set_gauge(
'config_alerts',
'Tarantool 3 configuration alerts',
count,
{level = level},
nil,
{default = true}
)
end

local config_status = get_config_status(config_info)

for status, value in pairs(config_status) do
collectors_list.config_status = utils.set_gauge(
'config_status',
'Tarantool 3 configuration status',
value,
{status = status},
nil,
{default = true}
)
end
end

return {
update = update,
list = collectors_list,
}
16 changes: 16 additions & 0 deletions metrics/utils.lua
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,20 @@ function utils.delete_collectors(list)
table.clear(list)
end

local function get_tarantool_version()
local version_parts = rawget(_G, '_TARANTOOL'):split('-', 3)

local major_minor_patch_parts = version_parts[1]:split('.', 2)
local major = tonumber(major_minor_patch_parts[1])
local minor = tonumber(major_minor_patch_parts[2])
local patch = tonumber(major_minor_patch_parts[3])

return major, minor, patch
end

function utils.is_tarantool3()
local major = get_tarantool_version()
return major == 3
end

return utils
Loading
Loading