feat(dup): add metrics for duplication #393

neverchanje · 2020-02-12T04:13:48Z

This PR introduces several metrics for duplication:

perf_counter_wrapper _counter_dup_log_read_bytes_rate;
perf_counter_wrapper _counter_dup_log_read_mutations_rate;
perf_counter_wrapper _counter_dup_shipped_bytes_rate;
perf_counter_wrapper _counter_dup_confirmed_rate;
perf_counter_wrapper _counter_dup_pending_mutations_count;
perf_counter_wrapper _counter_dup_time_lag;

log_read_bytes_rate

name: replica*eon.replica_stub*dup.log_read_bytes_rate

Calculates the bytes rate read from the private-log.

The curve line is usually identical with replica*eon.replica_stub*shared.log.recent.write.size. Because when everything normal, what is written is what duplicated, then:

log_read_bytes_rate = shared.log.recent.write.size = shipped_bytes_rate

But in some failure conditions, log_read_bytes_rate may be much larger, which can be used to identify if log reading during duplication works abnormally.

log_read_mutations_rate

name: eon.replica_stub dup.log_read_mutations_rate

Read rate in mutations number. The same usage as "log_read_bytes_rate".

shipped_bytes_rate

name: eon.replica_stub dup.shipped_bytes_rate

The network output bytes for successfully delivered duplication_request.

In some failure conditions, the curve may be dropped to 0, for example when the inter-cluster network is unavailable.

confirmed_rate

eon.replica_stub dup.confirmed_rate

The rate of confirmed writes, which indicates the number of writes that are duplicated and also confirmed by meta server.

pending_mutations_count

eon.replica_stub dup.pending_mutations_count

The number of writes that are not duplicated, this is one of the most important metrics for duplication. The more pending means weaker consistency. By practice, it's recommended to set an alarm threshold for this metric. Beyond the threshold, the duplication should

time_lag(ms)

eon.replica_stub dup.time_lag(ms)

The "latency" between 1. time of the client write arrives at replica server 2. time that the write duplicated and applied to the remote cluster.

t0 -> t1 -> t2
client -> replica server -> remote cluster
time_lag = t2-t1

…trics

src/dist/replication/common/replication_common.cpp

acelyc111 · 2020-02-15T15:21:54Z

src/dist/replication/common/replication_common.cpp

@@ -273,9 +273,6 @@ void replication_options::initialize()

    duplication_disabled = dsn_config_get_value_bool(
        "replication", "duplication_disabled", duplication_disabled, "is duplication disabled");
-    if (allow_non_idempotent_write && !duplication_disabled) {


为什么不要这个约束了？

暂时不考虑这个约束，一方面因为我们线上默认开启 allow_non_idempotent_write，一方面是开启热备份的表可以在接入层面对业务进行要求，不一定要写死在程序里。

那现在开热备的表能够同时进行非幂等吗？

目前不行，如果禁止的话，可能也不会依赖配置来禁止非幂等的写，毕竟一个集群可能有的表热备份，有的表不热备份。

嗯，好，那目前有什么措施保证在热备的表没有进行非幂等操作呢？如果这个由业务控制而我们代码上没有限制，感觉还是有些不安全

目前还没有，初步的想法是在 pegasus 那边改，遇到 INCR 和 CHECK_AND_SET 就写一个 empty write，然后返回错误。但是还没实现。HBase 是支持热备份 INCR 的，就是复制的过程中，把 INCR 转为 PUT，但是这个流程 pegasus 这边很难写。

嗯，好，记个TODO吧，这个最好还是代码上限制一下，靠业务的自觉性太不安全了

src/dist/replication/lib/duplication/replica_duplicator_manager.cpp

src/dist/replication/lib/duplication/replica_duplicator.cpp

src/dist/replication/lib/replica_stub.cpp

…trics

neverchanje and others added 7 commits February 6, 2020 15:55

feat(dup): add metrics for duplication

c1dddd8

Merge branch 'master' of https://github.com/XiaoMi/rdsn into dup-metrics

4aecfdf

fix name

1c10de5

fix name

c819cfd

format

7159ab5

fix case601

8864e32

Merge branch 'master' into dup-metrics

444b57e

neverchanje changed the title ~~feat(dup): add metrics for duplication [WIP]~~ feat(dup): add metrics for duplication Feb 14, 2020

neverchanje added the component/duplication label Feb 14, 2020

neverchanje added 2 commits February 14, 2020 12:09

fix name

1cdfc6f

Merge branch 'dup-metrics' of github.com:neverchanje/rdsn into dup-me…

3f8e433

…trics

acelyc111 reviewed Feb 15, 2020

View reviewed changes

weekly-digest bot mentioned this pull request Feb 16, 2020

Weekly Digest (9 February, 2020 - 16 February, 2020) #400

Closed

levy5307 reviewed Feb 18, 2020

View reviewed changes

src/dist/replication/lib/duplication/replica_duplicator.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Feb 18, 2020

View reviewed changes

src/dist/replication/lib/duplication/replica_duplicator.cpp Outdated Show resolved Hide resolved

levy5307 reviewed Feb 18, 2020

View reviewed changes

src/dist/replication/lib/replica_stub.cpp Show resolved Hide resolved

hycdong and others added 5 commits February 18, 2020 14:23

Merge branch 'master' into dup-metrics

038c69d

Merge branch 'master' into dup-metrics

39396f3

fix review

c10ed8c

Merge branch 'dup-metrics' of github.com:neverchanje/rdsn into dup-me…

de82e04

…trics

fix

0a1ef53

levy5307 approved these changes Feb 19, 2020

View reviewed changes

acelyc111 approved these changes Feb 19, 2020

View reviewed changes

hycdong merged commit 7a46628 into XiaoMi:master Feb 19, 2020

weekly-digest bot mentioned this pull request Feb 23, 2020

Weekly Digest (16 February, 2020 - 23 February, 2020) #402

Closed

neverchanje added the type/perf-counter PR that made modification on perf-counter, which should be noted in release note. label Mar 12, 2020

neverchanje deleted the dup-metrics branch March 19, 2020 06:41

neverchanje mentioned this pull request Mar 30, 2020

Release 1.12.3 apache/incubator-pegasus#506

Closed

neverchanje pushed a commit that referenced this pull request Mar 31, 2020

feat(dup): add metrics for duplication (#393)

c3051e9

neverchanje added the 1.12.3 label Apr 17, 2020

neverchanje mentioned this pull request Jul 21, 2020

feat(bulk-load): add perf-counter #567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dup): add metrics for duplication #393

feat(dup): add metrics for duplication #393

neverchanje commented Feb 12, 2020 •

edited

Loading

acelyc111 Feb 15, 2020

neverchanje Feb 18, 2020

hycdong Feb 18, 2020

neverchanje Feb 19, 2020

hycdong Feb 19, 2020

neverchanje Feb 19, 2020

hycdong Feb 19, 2020

feat(dup): add metrics for duplication #393

feat(dup): add metrics for duplication #393

Conversation

neverchanje commented Feb 12, 2020 • edited Loading

log_read_bytes_rate

log_read_mutations_rate

shipped_bytes_rate

confirmed_rate

pending_mutations_count

time_lag(ms)

acelyc111 Feb 15, 2020

Choose a reason for hiding this comment

neverchanje Feb 18, 2020

Choose a reason for hiding this comment

hycdong Feb 18, 2020

Choose a reason for hiding this comment

neverchanje Feb 19, 2020

Choose a reason for hiding this comment

hycdong Feb 19, 2020

Choose a reason for hiding this comment

neverchanje Feb 19, 2020

Choose a reason for hiding this comment

hycdong Feb 19, 2020

Choose a reason for hiding this comment

neverchanje commented Feb 12, 2020 •

edited

Loading