Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(collector): add statistics for partition hotspot #427

Closed
wants to merge 22 commits into from

Conversation

levy5307
Copy link
Contributor

@levy5307 levy5307 commented Nov 19, 2019

What problem does this PR solve?

In online pegasus system, There are some hotspot keys. Which produce unbalanced flow. so statistics for partition hotspot is needed.

What is changed and how it works?

In info collector server, it will get qps/cu for each partition, and count the max/min of them. Using max/min, we can get the scale between max and min. If the scale is greater than 10, we assume the partition has the max value is a hotspot.

Check List

Related changes

  • Need to cherry-pick to the release branch
    yes
  • Need to update the documentation
    no
  • Need to be included in the release note
    yes

@hycdong
Copy link
Contributor

hycdong commented Nov 20, 2019

为什么要把读写的qps和cu合在一起判断热点呢?

@levy5307
Copy link
Contributor Author

为什么要把读写的qps和cu合在一起判断热点呢?

是拿qps和cu两个维度来衡量哪个partition是热点,不过cu那个用10倍来判断不是特别好,感觉应该调大一点

@hycdong
Copy link
Contributor

hycdong commented Nov 26, 2019

为什么要把读写的qps和cu合在一起判断热点呢?

是拿qps和cu两个维度来衡量哪个partition是热点,不过cu那个用10倍来判断不是特别好,感觉应该调大一点

我其实想问的是把读写分开是不是更准确?根据读的qps和cu来算读热点,写的qps和cu来算写热点,读写混合在一起判断的热点会不会不够准确?

@levy5307
Copy link
Contributor Author

为什么要把读写的qps和cu合在一起判断热点呢?

是拿qps和cu两个维度来衡量哪个partition是热点,不过cu那个用10倍来判断不是特别好,感觉应该调大一点

我其实想问的是把读写分开是不是更准确?根据读的qps和cu来算读热点,写的qps和cu来算写热点,读写混合在一起判断的热点会不会不够准确?

我觉得读和写分开倒不是准确不准确的问题,是可以区分出读热点和写热点。但是做热点功能的出发点是防止访问某些分片的流量过于集中、缓存分片服务被打垮。感觉区分出读和写意义不是很大呢

@neverchanje
Copy link
Contributor

neverchanje commented Nov 27, 2019

热点统计建议用 policy/strategy 设计模式来实现,这样找相关代码的时候,看到譬如 “TableHotspotPolicy”,可以知道这是做热点统计的代码。这有个好处是,如果未来我们想修改热点判断的策略,你只需要改 TableHotspotPolicy 的代码,与其他模块解藕。你现在相当于假定了我们的策略一定是 “最大最小值”,这个写法以后重构比较麻烦。

设计上你可以

class table_hotspot_policy {
public:
  void detect_hotspot (table_statistics stats);
private:
  // 实现可以有各种做法,可以用最大最小值,也可以有其他策略
  // 可以设计成打分制,超出一定分值就算热点
  perf_counter _perf_hotspot_score;
  std::string table;
};

@neverchanje neverchanje changed the title feat: add statistics for partition hotspot feat(collector): add statistics for partition hotspot Nov 27, 2019
@neverchanje
Copy link
Contributor

我也认为用 total_qps 和 total_cu 来判断热点有问题,首先不是读有热点,写就有热点的。可能存在写很均匀但读有热点的情况。

建议的策略是先按 read_cu/write_cu 和每一种操作的 qps 来分别判断热点,每一项可以占 1 分
报警配置的阈值可以配 hotspot_score@miui_ad_algorithm>2 & repeat 30 times

@hycdong @acelyc111 可以讨论一下这样的策略是否合适。

src/server/info_collector.h Outdated Show resolved Hide resolved
src/server/info_collector.h Outdated Show resolved Hide resolved
@levy5307
Copy link
Contributor Author

热点统计建议用 policy/strategy 设计模式来实现,这样找相关代码的时候,看到譬如 “TableHotspotPolicy”,可以知道这是做热点统计的代码。这有个好处是,如果未来我们想修改热点判断的策略,你只需要改 TableHotspotPolicy 的代码,与其他模块解藕。你现在相当于假定了我们的策略一定是 “最大最小值”,这个写法以后重构比较麻烦。

设计上你可以

class table_hotspot_policy {
public:
  void detect_hotspot (table_statistics stats);
private:
  // 实现可以有各种做法,可以用最大最小值,也可以有其他策略
  // 可以设计成打分制,超出一定分值就算热点
  perf_counter _perf_hotspot_score;
  std::string table;
};

我觉得这个用这个策略模式倒是很好,但是策略模式的用途是当有各种不同的策略时,比如:打分、最大/最小。所以命名这块根据我们确定的策略命名,而不是简单的用table_hotspot_policy。因为不论打分还是最大/最小都属于table_hotspot_policy,table_hotspot_policy可以当做他们的父类来使用

@@ -130,88 +130,30 @@ void info_collector::stop() { _tracker.cancel_outstanding_tasks(); }
void info_collector::on_app_stat()
{
ddebug("start to stat apps");
std::vector<row_data> rows;
if (!get_app_stat(&_shell_context, "", rows)) {
std::map<std::string, std::vector<row_data>> all_rows;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些只是重构吧?重构的先单独提一个PR便于review吧

@acelyc111
Copy link
Member

duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants