feat(collector): add statistics for partition hotspot #427

levy5307 · 2019-11-19T09:41:44Z

What problem does this PR solve?

In online pegasus system, There are some hotspot keys. Which produce unbalanced flow. so statistics for partition hotspot is needed.

What is changed and how it works?

In info collector server, it will get qps/cu for each partition, and count the max/min of them. Using max/min, we can get the scale between max and min. If the scale is greater than 10, we assume the partition has the max value is a hotspot.

Check List

Related changes

Need to cherry-pick to the release branch
yes
Need to update the documentation
no
Need to be included in the release note
yes

hycdong · 2019-11-20T10:07:45Z

为什么要把读写的qps和cu合在一起判断热点呢？

levy5307 · 2019-11-26T08:52:35Z

为什么要把读写的qps和cu合在一起判断热点呢？

是拿qps和cu两个维度来衡量哪个partition是热点，不过cu那个用10倍来判断不是特别好，感觉应该调大一点

hycdong · 2019-11-26T09:10:40Z

为什么要把读写的qps和cu合在一起判断热点呢？

是拿qps和cu两个维度来衡量哪个partition是热点，不过cu那个用10倍来判断不是特别好，感觉应该调大一点

我其实想问的是把读写分开是不是更准确？根据读的qps和cu来算读热点，写的qps和cu来算写热点，读写混合在一起判断的热点会不会不够准确？

levy5307 · 2019-11-27T02:44:08Z

为什么要把读写的qps和cu合在一起判断热点呢？

是拿qps和cu两个维度来衡量哪个partition是热点，不过cu那个用10倍来判断不是特别好，感觉应该调大一点

我其实想问的是把读写分开是不是更准确？根据读的qps和cu来算读热点，写的qps和cu来算写热点，读写混合在一起判断的热点会不会不够准确？

我觉得读和写分开倒不是准确不准确的问题，是可以区分出读热点和写热点。但是做热点功能的出发点是防止访问某些分片的流量过于集中、缓存分片服务被打垮。感觉区分出读和写意义不是很大呢

neverchanje · 2019-11-27T03:13:46Z

热点统计建议用 policy/strategy 设计模式来实现，这样找相关代码的时候，看到譬如 “TableHotspotPolicy”，可以知道这是做热点统计的代码。这有个好处是，如果未来我们想修改热点判断的策略，你只需要改 TableHotspotPolicy 的代码，与其他模块解藕。你现在相当于假定了我们的策略一定是 “最大最小值”，这个写法以后重构比较麻烦。

设计上你可以

class table_hotspot_policy {
public:
  void detect_hotspot (table_statistics stats);
private:
  // 实现可以有各种做法，可以用最大最小值，也可以有其他策略
  // 可以设计成打分制，超出一定分值就算热点
  perf_counter _perf_hotspot_score;
  std::string table;
};

neverchanje · 2019-11-27T05:12:58Z

我也认为用 total_qps 和 total_cu 来判断热点有问题，首先不是读有热点，写就有热点的。可能存在写很均匀但读有热点的情况。

建议的策略是先按 read_cu/write_cu 和每一种操作的 qps 来分别判断热点，每一项可以占 1 分
报警配置的阈值可以配 hotspot_score@miui_ad_algorithm>2 & repeat 30 times。

@hycdong @acelyc111 可以讨论一下这样的策略是否合适。

src/server/info_collector.h

levy5307 · 2019-11-27T07:33:57Z

热点统计建议用 policy/strategy 设计模式来实现，这样找相关代码的时候，看到譬如 “TableHotspotPolicy”，可以知道这是做热点统计的代码。这有个好处是，如果未来我们想修改热点判断的策略，你只需要改 TableHotspotPolicy 的代码，与其他模块解藕。你现在相当于假定了我们的策略一定是 “最大最小值”，这个写法以后重构比较麻烦。

设计上你可以
class table_hotspot_policy {
public:
  void detect_hotspot (table_statistics stats);
private:
  // 实现可以有各种做法，可以用最大最小值，也可以有其他策略
  // 可以设计成打分制，超出一定分值就算热点
  perf_counter _perf_hotspot_score;
  std::string table;
};

我觉得这个用这个策略模式倒是很好，但是策略模式的用途是当有各种不同的策略时，比如：打分、最大/最小。所以命名这块根据我们确定的策略命名，而不是简单的用table_hotspot_policy。因为不论打分还是最大/最小都属于table_hotspot_policy，table_hotspot_policy可以当做他们的父类来使用

acelyc111 · 2019-12-17T09:43:29Z

src/server/info_collector.cpp

@@ -130,88 +130,30 @@ void info_collector::stop() { _tracker.cancel_outstanding_tasks(); }
 void info_collector::on_app_stat()
 {
    ddebug("start to stat apps");
-    std::vector<row_data> rows;
-    if (!get_app_stat(&_shell_context, "", rows)) {
+    std::map<std::string, std::vector<row_data>> all_rows;


这些只是重构吧？重构的先单独提一个PR便于review吧

acelyc111 · 2020-02-15T14:34:34Z

duplicate

…427)

zhaoliwei added 20 commits November 7, 2019 15:05

get_app_partition_stat

813164d

hotspot

c66556b

modify get_app_partition_stat

42aa1c4

Merge branch 'command_helper' into hotspot

6f08952

hotspot

8aee24d

hotspot

a3a5758

hot spot

cc1c49f

hot spot

ddc52bc

hot spot

89e6f5f

hot spot

7750438

hotspot

98ee1fa

hot spot

30a1fd4

hot spot

db88ea3

Merge remote-tracking branch 'upstream/master' into hotspot

d012ff9

merge master

7fa659e

hotspot

f7e29ef

format

7f34a93

rdsn

b18dede

update rdsn

2b01847

Merge branch 'master' into hotspot

bd02151

neverchanje requested a review from qinzuoyan November 21, 2019 06:44

weekly-digest bot mentioned this pull request Nov 24, 2019

Weekly Digest (17 November, 2019 - 24 November, 2019) #428

Closed

neverchanje changed the title ~~feat: add statistics for partition hotspot~~ feat(collector): add statistics for partition hotspot Nov 27, 2019

neverchanje reviewed Nov 27, 2019

View reviewed changes

src/server/info_collector.h Outdated Show resolved Hide resolved

src/server/info_collector.h Outdated Show resolved Hide resolved

hotspot

e2dfa84

weekly-digest bot mentioned this pull request Dec 1, 2019

Weekly Digest (24 November, 2019 - 1 December, 2019) #434

Closed

Merge branch 'master' into hotspot

fb4f90f

weekly-digest bot mentioned this pull request Dec 8, 2019

Weekly Digest (1 December, 2019 - 8 December, 2019) #439

Closed

acelyc111 reviewed Dec 17, 2019

View reviewed changes

weekly-digest bot mentioned this pull request Dec 22, 2019

Weekly Digest (15 December, 2019 - 22 December, 2019) #448

Closed

acelyc111 closed this Feb 15, 2020

levy5307 deleted the hotspot branch May 26, 2020 06:04

acelyc111 pushed a commit that referenced this pull request Jun 23, 2022

refactor: move log_block class from mutation_log.h to separated file (#…

3b1133b

…427)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(collector): add statistics for partition hotspot #427

feat(collector): add statistics for partition hotspot #427

levy5307 commented Nov 19, 2019 •

edited

Loading

hycdong commented Nov 20, 2019

levy5307 commented Nov 26, 2019

hycdong commented Nov 26, 2019 •

edited

Loading

levy5307 commented Nov 27, 2019

neverchanje commented Nov 27, 2019 •

edited

Loading

neverchanje commented Nov 27, 2019

levy5307 commented Nov 27, 2019

acelyc111 Dec 17, 2019

acelyc111 commented Feb 15, 2020

feat(collector): add statistics for partition hotspot #427

feat(collector): add statistics for partition hotspot #427

Conversation

levy5307 commented Nov 19, 2019 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

hycdong commented Nov 20, 2019

levy5307 commented Nov 26, 2019

hycdong commented Nov 26, 2019 • edited Loading

levy5307 commented Nov 27, 2019

neverchanje commented Nov 27, 2019 • edited Loading

neverchanje commented Nov 27, 2019

levy5307 commented Nov 27, 2019

acelyc111 Dec 17, 2019

Choose a reason for hiding this comment

acelyc111 commented Feb 15, 2020

levy5307 commented Nov 19, 2019 •

edited

Loading

hycdong commented Nov 26, 2019 •

edited

Loading

neverchanje commented Nov 27, 2019 •

edited

Loading