Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show the maximum available CPU/Memory on TiFlash Grafana panel #3821

Closed
JaySon-Huang opened this issue Jan 6, 2022 · 5 comments · Fixed by #5124
Closed

Show the maximum available CPU/Memory on TiFlash Grafana panel #3821

JaySon-Huang opened this issue Jan 6, 2022 · 5 comments · Fixed by #5124
Assignees
Labels
type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@JaySon-Huang
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe:

Now we can not know what's the maximum available CPU/Memory on Grafana panel, especially on k8s deployed env.

Describe the feature you'd like:

Show the maximum available CPU/Memory on TiFlash Grafana panel.
I've created a basic branch for getting the maximum available CPU/Memory quota: https://github.com/JaySon-Huang/tics/commits/fix_cpu_count

The show style could be like this:
image

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

Useful for checking whether TiFlash resource is running out, especially on k8s env

@JaySon-Huang JaySon-Huang added the type/feature-request Categorizes issue or PR as related to a new feature. label Jan 6, 2022
@jiaqizho jiaqizho self-assigned this Jan 11, 2022
@JaySon-Huang
Copy link
Contributor Author

Check cgroup limits under /sys/fs/cgroup/cpu/.

Reference: http://hg.openjdk.java.net/jdk/jdk/rev/7f22774a5f42

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jan 12, 2022

@jiaqizho
Copy link
Contributor

We may open a new issue for cgroup v2 limitation

Reference:

I can't test cgroup v2 in the company(cause kernel version is too low). The only I can do is that using my personal computer to install a high-level virtual machine. Then test the cgroup v2(Also I can't use VM with mac M1).

There is a questions about metrics refresh: The user running tiflash, then the user puts tiflash process into cgroup. So I changed PR to avoid getting the incorrect value(added the in real cgroup logical). If tiflash process has been added in cgroup before we send metrics. Then the user will get a resource limit value. But if not, the user won't get the limit value.

And we can't get any notification if the cgroup of the process has been changed. So should we get these metrics by timer?

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jan 13, 2022

cgroup v2 is not enabled by default as far as I know. We don't really need to care about it now. However, we can leave some comments and an issue for later developers who care about that in the future. 😃

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jan 13, 2022

The user running tiflash, then the user puts tiflash process into cgroup.

I think we don't worry too much about deploying tiflash with cgroup limit in this way. Now we only suggest users deploy tiflash with tiup or tidb-operator in the production env. And users rarely run processes then move them into cgroup in production env.

I'll check your changes later.

@Lloyd-Pottiger Lloyd-Pottiger self-assigned this Apr 21, 2022
Lloyd-Pottiger pushed a commit to Lloyd-Pottiger/tiflash that referenced this issue Jul 12, 2022
…s in README (pingcap#5182)

close pingcap#5172, ref pingcap#5178

Enhancement: add a integrated test on DDL module (pingcap#5130)

ref pingcap#5129

Revert "Revise default background threads size" (pingcap#5176)

close pingcap#5177

chore: remove extra dyn cast (pingcap#5186)

close pingcap#5185

Add MPPReceiverSet, which includes ExchangeReceiver and CoprocessorReader (pingcap#5175)

ref pingcap#5095

DDL: Use Column Name Instead of Offset to Find the common handle cluster index (pingcap#5166)

close pingcap#5154

Add random failpoint in critical paths (pingcap#4876)

close pingcap#4807

Segment test framework (pingcap#5150)

close pingcap#5151

optimize ps v3 restore (pingcap#5163)

ref pingcap#4914

Fix build failed (pingcap#5196)

close pingcap#5195

feat: delta tree dispatching (pingcap#5199)

close pingcap#5200

feat: introduce specialized API to write fixed length data rapidly (pingcap#5181)

close pingcap#5183

Add gtest for Limit, TopN, Projection (pingcap#5187) (pingcap#5188)

close pingcap#5187

add `MPPTask::handleError()` (pingcap#5202)

ref pingcap#5095

Check result of starting grpc server (pingcap#5257)

close pingcap#5255

feat: add optimized routines for aarch64 (pingcap#5231)

close pingcap#5240

fix: aarch64-quick-fix (pingcap#5259)

close pingcap#5260

Update client-c to support ipv6 (pingcap#5270)

close pingcap#5247

upgrade prometheus-cpp to v1.0.1 (pingcap#5279)

ref pingcap#2103, close pingcap#5278

Fix README type error (pingcap#5273)

ref pingcap#5178

fix(cmake): make sure libc++ is utilized by tiflash-proxy (pingcap#5281)

close pingcap#5282

fix the wrong order of execution summary for list based executors (pingcap#5242)

close pingcap#5241

Schema: allow loading empty schema diff when the version grows up. (pingcap#5245)

close pingcap#5244

Optimize apply speed under heavy write pressure (pingcap#4883)

ref pingcap#4728

update proxy to raftstore-proxy-6.2 (pingcap#5287)

ref pingcap#4982

Flush segment cache when doing the compaction (pingcap#5284)

close pingcap#5179

metrics: Fix incorrect metrics for delta_merge tasks (pingcap#5061)

close pingcap#5055

dep: upgrade jemalloc (pingcap#5197)

close pingcap#5258

*: TiFlash pagectl/dttool use only-decryption mode (pingcap#5271)

close pingcap#5122

suppresion false positive report from tsan (pingcap#5303)

close pingcap#5088

Refine test framework code and tests (pingcap#5261)

close pingcap#5262

feat: add logical cpu cores and memory into grafana (pingcap#5124)

close pingcap#3821

Implement TimeToSec function push down (pingcap#5235)

close pingcap#5116

feat: implement shiftRight function push down (pingcap#5156)

close pingcap#5100

schema : make update to partition tables when 'set tiflash replica' (pingcap#5267)

close pingcap#5266

Replace initializer_list with vector for planner test framework (pingcap#5307)

close pingcap#5295

KVStore: decouple flush region and CompactLog with a new FFI fn_try_flush_data (pingcap#5283)

ref pingcap#5170

refine error message in mpptask (pingcap#5304)

ref pingcap#5095

Implement ReverseUTF8/Reverse function push down (pingcap#5233)

close pingcap#5111

Optimize comparision for collation `UTF8_BIN` and `UTF8MB4_BIN` (pingcap#5299)

ref pingcap#5294

feat : support set tiflash mode ddl action (pingcap#5256)

ref pingcap#5252

Add non-blocking functions for MPMCQueue (pingcap#5311)

close pingcap#5310

add random segment test for CI weekly (pingcap#5300)

close pingcap#5301

*: tidy FunctionString.cpp (pingcap#5312)

close pingcap#5313

ci: fix check-license github action (pingcap#5318)

close pingcap#5317

update proxy to raftstore-proxy-6.2 (pingcap#5316)

ref pingcap#4982

Change one `additional_input_at_end` to many streams in `ParallelInputsProcessor`  (pingcap#5274)

close pingcap#4856, close pingcap#5263

support fine grained shuffle for window function (pingcap#5048)

close pingcap#5142

feat: pushdown get_format into TiFlash (pingcap#5269)

close pingcap#5115

fix: format throw data truncated error (pingcap#5272)

close pingcap#4891

Print content of columns for gtest (pingcap#5243)

close pingcap#5203

*: also enable O3 for aarch64 (pingcap#5338)

close pingcap#5342

Add debug image build target for CentOS7 (pingcap#5344)

close pingcap#5343

*: mini refactor (pingcap#5326)

close pingcap#4739

Refactor initialize of background pool (pingcap#5190)

close pingcap#5189

delete copy/move ctor of MPMCQueue explicitly (pingcap#5328)

close pingcap#5329

Introduce proxy_server and new-mock-engine-store (pingcap#5319)

ref pingcap#5170

fix: incorrect uptime in grafana panel

Signed-off-by: Lloyd-Pottiger <[email protected]>
Lloyd-Pottiger added a commit to Lloyd-Pottiger/tiflash that referenced this issue Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
4 participants