Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statistics for CRUD operations on router #244

Merged
merged 8 commits into from
Feb 25, 2022

Conversation

DifferentialOrange
Copy link
Member

@DifferentialOrange DifferentialOrange commented Nov 29, 2021

Add statistics module for collecting metrics of CRUD operations on
router. Wrap all CRUD operation calls in the statistics collector.
Statistics must be enabled manually with crud.cfg. They can be
disabled, restarted or re-enabled later.

crud.stats() returns

---
- spaces:
    my_space:
      insert:
        ok:
          latency: 0.002
          count: 19800
          time: 39.6
        error:
          latency: 0.000001
          count: 4
          time: 0.000004
      select:
        ok:
          latency: 0.032
          count: 43100
          time: 1379.2
        error:
          latency: 0.000001
          count: 2
          time: 0.000002
        details:
          map_reduces: 48
          tuples_fetched: 105000
          tuples_lookup: 2380000
---

spaces section contains statistics for each observed space.
If operation has never been called for a space, the corresponding
field will be empty. If no requests has been called for a
space, it will not be represented. Space data is based on
client requests rather than storages schema, so requests
for non-existing spaces are also collected.

This patch introduces crud.cfg. crud.cfg is a tool to set module
configuration. It is similar to Tarantool box.cfg, although we don't
need to call it to bootstrap the module -- it is used only to change
configuration. crud.cfg is a callable table. To change configuration,
call it: crud.cfg{ stats = true }. You can check table contents as
with ordinary table, but do not change them directly -- use call
instead. Table contents is immutable and use proxy approach
(see [1, 2]). Iterating through crud.cfg with pairs is not supported
yet, refer to #265.

Possible statistics operation labels are
insert (for insert and insert_object calls),
get, replace (for replace and replace_object calls), update,
upsert (for upsert and upsert_object calls), delete,
select (for select and pairs calls), truncate, len, count
and borders (for min and max calls).

Each operation section consists of different collectors
for success calls and error (both error throw and nil, err)
returns. count is total requests count since instance start
or stats restart. latency is average time of requests execution,
time is the total time of requests execution.

Since pairs request behavior differs from any other crud request, its
statistics collection also has specific behavior. Statistics (select
section) are updated after pairs cycle is finished: you
either have iterated through all records or an error was thrown.
If your pairs cycle was interrupted with break, statistics will
be collected when pairs objects are cleaned up with Lua garbage
collector.

Statistics are preserved between package reloads. Statistics are
preserved between Tarantool Cartridge role reloads [3] if CRUD Cartridge
roles are used.

Statistics select section additionally contains
details collectors.
map_reduces is the count of planned map reduces (including those not
executed successfully). tuples_fetched is the count of tuples fetched
from storages during execution, tuples_lookup is the count of tuples
looked up on storages while collecting responses for calls (including
scrolls for multibatch requests). Details data is updated as part of
the request process, so you may get new details before select/pairs
call is finished and observed with count, latency and time collectors. q

Use in-built crud.stats() info instead on storage_stat helper
in tests to track map reduce calls.

If metrics [4] found, you can use metrics collectors to store
statistics. metrics >= 0.10.0 is required to use metrics driver.
(metrics >= 0.9.0 is required to use summary quantiles with
age buckets. metrics >= 0.5.0, < 0.9.0 is unsupported
due to quantile overflow bug [5]. metrics == 0.9.0 has bug that do
not permits to create summary collector without quantiles [6].
In fact, user may use metrics >= 0.5.0, metrics != 0.9.0
if he wants to use metrics without quantiles, and metrics >= 0.9.0
if he wants to use metrics with quantiles. But this is confusing,
so let's use a single restriction for both cases.)

The metrics are part of global registry and can be exported together
(e.g. to Prometheus) with default tools without any additional
configuration. Disabling stats destroys the collectors.

Metrics collectors are used by default if supported. To explicitly set
driver, call crud.cfg{ stats = true, stats_driver = driver }
('local' or 'metrics'). To enable quantiles, call

crud.cfg{
    stats = true,
    stats_driver = 'metrics',
    stats_quantiles = true,
}

With quantiles, latency statistics are changed to 0.99 quantile
of request execution time (with aging). Quantiles computations increases
performance overhead up to 10% when used in statistics.

Add CI matrix to run tests with metrics installed. To get full
coverage on coveralls, #248 must be resolved.

The metrics are part of global registry and can be exported together
(e.g. to Prometheus) with default tools without any additional
configuration. Disabling stats destroys the collectors.

Metrics collectors are used by default if supported. To explicitly set
driver, call crud.enable_stats{ driver = driver } ('local' or
'metrics'). To enable quantiles, call
crud.enable_stats{ driver = 'metrics', quantiles = true }.
With quantiles, latency statistics are changed to 0.99 quantile
of request execution time (with aging). Quantiles computations increases
performance overhead up to 10% when used in statistics.

Add CI matrix to run tests with metrics installed. To get full
coverage on coveralls, #248 must be resolved.

Metrics collectors are used by default if supported. To explicitly set
driver, call crud.enable_stats{ driver = driver } ('local' or
'metrics').

If metrics used, latency statistics are changed to 0.99 quantile
of request execution time (with aging).

Add CI matrix to run tests with metrics installed. To get full
coverage on coveralls, #248 must be resolved.

Before this patch, performance tests ran together with unit and
integration with --coverage flag. Coverage analysis cropped the
result of performance tests to 10-15 times. For metrics integration
it resulted in timeout errors and drop of performance which is not
reproduces with coverage disabled. Moreover, before this patch log
capture was disabled and performance tests did not displayed any
results after run. Now performance tests also run is separate CI job.

After this patch, make -C build coverage will run lightweight
version of performance test. make -C build performance will run real
performance tests.

You can paste output table to GitHub [7].

This path also reworks current performance test. It adds new cases to
compare module performance with or without statistics, statistic
wrappers and compare different metrics drivers and reports new info:
average call time and max call time.

Performance test result: overhead is 3-10% in case of local driver and
5-15% in case of metrics driver, up to 20% for metrics with
quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD.

Success requests per second

without stats wrapper stats disabled local stats metrics stats (no quantiles) metrics stats (with quantiles)
select by pk 18818.04 18666.49 17057.17 16223.08 15919.78
select gt by pk (limit 10) 4439.22 4411.50 4345.72 4137.92 4134.32
pairs gt by pk (limit 100) 1667.76 1643.36 1485.64 1448.39 1470.19
insert 39808.06 39392.49 35940.60 34346.48 32155.64

Max call time

without stats wrapper stats disabled local stats metrics stats (no quantiles) metrics stats (with quantiles)
select by pk 55.865 ms 54.517 ms 51.106 ms 57.375 ms 45.661 ms
select gt by pk (limit 10) 100.522 ms 95.305 ms 98.899 ms 110.826 ms 102.551 ms
pairs gt by pk (limit 100) 111.484 ms 149.179 ms 125.325 ms 165.374 ms 124.922 ms
insert 52.945 ms 49.434 ms 52.853 ms 55.963 ms 62.925 ms

Performance overhead is 3-10% in case of local driver and
5-15% in case of metrics driver, up to 20% for metrics with quantiles.

  1. http://lua-users.org/wiki/ReadOnlyTables
  2. Raise on raw modifications of box.cfg values tarantool#2867
  3. https://www.tarantool.io/en/doc/latest/book/cartridge/cartridge_api/modules/cartridge.roles/#reload
  4. https://github.com/tarantool/metrics
  5. Seems quantile overflow with undefined behaivior metrics#235
  6. Unregister callback metrics#262
  7. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables

I didn't forget about

  • Tests
  • Changelog
  • Documentation

Closes #224, closes #233

@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-224-operation-stats branch 7 times, most recently from 2c3c332 to 04e57b7 Compare December 6, 2021 12:00
@DifferentialOrange DifferentialOrange changed the title Operation stats Add statistics for CRUD operations on router Dec 6, 2021
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-224-operation-stats branch from 04e57b7 to 096c41f Compare December 6, 2021 13:25
@DifferentialOrange DifferentialOrange marked this pull request as ready for review December 6, 2021 13:34
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-224-operation-stats branch 3 times, most recently from 6e1306c to a734377 Compare December 6, 2021 14:09
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-224-operation-stats branch 7 times, most recently from 0504ed8 to 0b43ec6 Compare December 7, 2021 10:42
@DifferentialOrange
Copy link
Member Author

metrics integrations leads to timeouts in perf test: https://github.com/tarantool/crud/runs/4442731842?check_suite_focus=true

@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-224-operation-stats branch from 0b43ec6 to a7324e8 Compare December 8, 2021 14:23
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-224-operation-stats branch 2 times, most recently from 5a6e025 to dad3d17 Compare December 8, 2021 14:38
@DifferentialOrange
Copy link
Member Author

metrics integrations leads to timeouts in perf test: https://github.com/tarantool/crud/runs/4442731842?check_suite_focus=true

I tuned out summary parameters to not cause timeouts (on local runs). But performance drop is still noticeable (2-3 times). I have discussed the issue with @yngvar-antonsson and filed a ticket (tarantool/metrics#331). CI perf test not runs with metrics driver now (simply to not make test run twice a time it runs now), I think I will add separate perf test as a part of #225 solution.

AnaNek added a commit that referenced this pull request May 16, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 18, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 19, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 19, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 19, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 20, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 30, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request May 31, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 2, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 9, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 9, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 9, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 22, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 22, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 24, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 24, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 27, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 27, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 28, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
AnaNek added a commit that referenced this pull request Jun 28, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
Totktonada pushed a commit that referenced this pull request Jun 28, 2022
Since we have PR #244 it will be nice to collect
statistics for batch operations too.
To establish the effectiveness of `crud.batch_insert()`
method compared to `crud.insert()`, perf tests were added.
`crud.insert()` in the loop and `crud.batch_insert()`
are compared for different batch sizes.

Closes #193
DifferentialOrange added a commit that referenced this pull request Feb 3, 2023
`crud` module is cartridge-independent in nature, but provides cartridge
roles which are the most popular way to setup the module. The roles also
not use any modern cartridge features and should work with any cartridge
version. But since crud.cfg was introduced [1], it was required to add
some code for roles reload [2] proper support. Now
cartridge.hotreload module is unconditionally required, so roles cannot
be used with cartridge older than 2.4.0. This patch fixes the behavior.

1. 6da4f56
2. tarantool/cartridge@941952e

Follows #244
DifferentialOrange added a commit that referenced this pull request Feb 3, 2023
Before this patch, tests were marked with xfail since there was a bug in
metrics module [1]. This bug is fixes in newer versions, so xfail is
replaced with skip based on metrics version.

1. tarantool/metrics#334

Follows #244
DifferentialOrange added a commit that referenced this pull request Feb 3, 2023
Before this patch, tests were marked with xfail since there was a bug in
metrics module [1]. This bug is fixes in newer versions, so xfail is
replaced with skip based on metrics version.

1. tarantool/metrics#334

Follows #244
DifferentialOrange added a commit that referenced this pull request Feb 13, 2023
`crud` module is cartridge-independent in nature, but provides cartridge
roles which are the most popular way to setup the module. The roles also
not use any modern cartridge features and should work with any cartridge
version. But since crud.cfg was introduced [1], it was required to add
some code for roles reload [2] proper support. Now
cartridge.hotreload module is unconditionally required, so roles cannot
be used with cartridge older than 2.4.0. This patch fixes the behavior.

1. 6da4f56
2. tarantool/cartridge@941952e

Follows #244
DifferentialOrange added a commit that referenced this pull request Feb 13, 2023
Before this patch, tests were marked with xfail since there was a bug in
metrics module [1]. This bug is fixes in newer versions, so xfail is
replaced with skip based on metrics version.

1. tarantool/metrics#334

Follows #244
DifferentialOrange added a commit that referenced this pull request Feb 13, 2023
`crud` module is cartridge-independent in nature, but provides cartridge
roles which are the most popular way to setup the module. The roles also
not use any modern cartridge features and should work with any cartridge
version. But since crud.cfg was introduced [1], it was required to add
some code for roles reload [2] proper support. Now
cartridge.hotreload module is unconditionally required, so roles cannot
be used with cartridge older than 2.4.0. This patch fixes the behavior.

1. 6da4f56
2. tarantool/cartridge@941952e

Follows #244
DifferentialOrange added a commit that referenced this pull request Feb 13, 2023
Before this patch, tests were marked with xfail since there was a bug in
metrics module [1]. This bug is fixes in newer versions, so xfail is
replaced with skip based on metrics version.

1. tarantool/metrics#334

Follows #244
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants