Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: support tagging SQL statements in pprof results #14312

Merged
merged 4 commits into from
Jan 2, 2020
Merged

server: support tagging SQL statements in pprof results #14312

merged 4 commits into from
Jan 2, 2020

Conversation

lysu
Copy link
Contributor

@lysu lysu commented Jan 2, 2020

What problem does this PR solve?

Related PR: *: replace conf item pprof_sql_cpu with srv var tidb_pprof_sql_cpu

In TiDB, we often use Go profiler (pprof) to analyze which parts of the code consume most of the CPU resources. But in actual user scenarios, when we’re troubleshooting the 100% CPU usage issue, it would also be of great help to know which parts of the SQL statement to over-consume CPU.

After v1.9, Go supports adding profiler labels that allow users to tag key-value pairs in pprof results, so we think tagging SQL statements in pprof would help. (ref: https://rakyll.org/profiler-labels/)

This PR adds SQL labels to TiDB's Go profiler.

What is changed and how it works?

Profiler labels are not free. There can be some costs to call runtime set goroutine, so this PR adds a config item and combine it with http://0.0.0.0:10080/reload-config to hot reload it.

[performance]
pprof-sql-cpu = false

The config item’s default value is false. You can set it to true and run curl http://0.0.0.0:10080/reload-config to analyze which parts of the SQL statement consume most of the CPU time.

Then, run curl http://0.0.0.0:10080/debug/pprof/profile --output p or curl http://0.0.0.0:10080/debug/pprof/zip --output tidb_debug.zip to get pprof results.

In the pprof results, you can observe tags as below:

go tool pprof  p1                                        
File: ___go_build_github_com_pingcap_tidb_tidb_server
Build ID: b1542297fdc1541f774e2ab7c6cc62de595ee7f1
Type: cpu
Time: Jan 2, 2020 at 11:55am (CST)
Duration: 30s, Total samples = 300ms (    1%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) tags
 sql: Total 40.0ms
      20.0ms (50.00%): insert into t ( a ) values ( ? )
      20.0ms (50.00%): select * from t where a > ?

image

NOTE:

  1. You need to set pprof-sql-cpu back to false and curl http://0.0.0.0:10080/reload-config after checking the pprof result.

  2. TiDB will only tag the SQL statements that are executed after setting pprof-sql-cpu to true.

Check List

Tests

  • Manual test
set conf and run pprof

Code changes

  • add config item
  • add reload
  • setup labels

Side effects

  • n/a

Related changes

  • Need to cherry-pick to the release branch

Release note

  • support tag sql in pprof result.

This change is Reviewable

tidb-server/main.go Outdated Show resolved Hide resolved
Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp jackysp added the status/LGT1 Indicates that a PR has LGTM 1. label Jan 2, 2020
server/conn.go Show resolved Hide resolved
Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qw4990 qw4990 added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jan 2, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Jan 2, 2020

/run-all-tests

@sre-bot sre-bot merged commit 5d76de9 into pingcap:master Jan 2, 2020
@sre-bot
Copy link
Contributor

sre-bot commented Jan 2, 2020

cherry pick to release-3.0 failed

@lysu
Copy link
Contributor Author

lysu commented Jan 3, 2020

/run-cherry-picker

@lysu lysu deleted the tag-sql-in-pprof branch January 3, 2020 03:12
@sre-bot
Copy link
Contributor

sre-bot commented Jan 3, 2020

cherry pick to release-3.0 failed

@lysu
Copy link
Contributor Author

lysu commented Jan 7, 2020

Benchmark Report when set pprof-sql-cpu = true

Run Sysbench Performance Test on VMs

@@                               Benchmark Diff                               @@
================================================================================
--- tidb: b398ef4d2ead6ae98287d9489976a7dc8dba8e49
+++ tidb: 91144ca41fcd86801e85ca8be2d748f3b3885568
tikv: 1211ed161c093ab180e42879146dedd05c4c6fb5
pd: 4498b6f203588f556cb6f6ebcdcb55f0afff0efa
================================================================================
oltp_update_index:
    * QPS: 4246.60 ± 0.04% (std=1.15) delta: 0.06% (p=0.363)
    * Latency p50: 30.14 ± 0.03% (std=0.01) delta: -0.07%
    * Latency p99: 54.84 ± 3.64% (std=1.22) delta: -1.81%
            
oltp_insert:
    * QPS: 7567.92 ± 0.04% (std=2.21) delta: 0.47% (p=0.029)
    * Latency p50: 16.91 ± 0.04% (std=0.00) delta: -0.49%
    * Latency p99: 29.07 ± 4.09% (std=0.78) delta: -1.76%
            
oltp_read_write:
    * QPS: 15694.90 ± 0.25% (std=27.78) delta: -0.52% (p=0.072)
    * Latency p50: 163.45 ± 0.21% (std=0.25) delta: 0.55%
    * Latency p99: 326.02 ± 1.81% (std=4.79) delta: 0.90%
            
oltp_point_select:
    * QPS: 41161.17 ± 0.17% (std=53.14) delta: -3.90% (p=0.020)
    * Latency p50: 3.11 ± 0.21% (std=0.00) delta: 4.02%
    * Latency p99: 10.09 ± 0.00% (std=0.00) delta: 3.70%
            
oltp_update_non_index:
    * QPS: 4715.66 ± 0.83% (std=28.71) delta: -0.12% (p=0.341)
    * Latency p50: 27.14 ± 0.84% (std=0.17) delta: 0.14%
    * Latency p99: 43.92 ± 2.41% (std=0.75) delta: 2.54%
            

Originally posted by @sre-bot in #14362 (comment)

@sre-bot
Copy link
Contributor

sre-bot commented Jan 7, 2020

@lysu No command or invalid command

1 similar comment
@sre-bot
Copy link
Contributor

sre-bot commented Jan 7, 2020

@lysu No command or invalid command

@yikeke yikeke changed the title server: support tag sql in pprof result server: support tagging SQL statements in pprof results Jan 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/server status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/usability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants