Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB hang when do sql "select count(*) from information_schema.tiflash_segments" #40988

Closed
hongyunyan opened this issue Feb 2, 2023 · 13 comments
Labels
severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@hongyunyan
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

In a cluster with 1 tidb 3 tikv and 1 tiflash, and tiflash has 650G data.

And TiDB hang when I want to do a query "select count(*) from information_schema.tiflash_segments".

while the http qury http://172.16.5.59:45641/?default_format=JSONCompact&query=SELECT+%2A+FROM+system.dt_segments can work well

1. Minimal reproduce step (Required)

  1. make a cluster with large data
  2. do query "select count(*) from information_schema.tiflash_segments"

2. What did you expect to see? (Required)

tidb work well

3. What did you see instead (Required)

image

in tidb.log, there are just a lot info like

[2023/02/02 16:43:31.220 +08:00] [INFO] [coprocessor.go:1205] ["[TIME_COP_PROCESS] resp_time:4.255924643s txnStartTS:18446744073709551615 regio n_id:159656 store_addr:172.16.5.59:45625 kv_process_ms:4245 kv_wait_ms:0 kv_read_ms:0 processed_versions:55718 total_versions:55719 rocksdb_del ete_skipped_count:0 rocksdb_key_skipped_count:111435 rocksdb_cache_hit_count:12 rocksdb_read_count:1540 rocksdb_read_byte:28736934"]

4. What is your TiDB version? (Required)

nightly

@hongyunyan hongyunyan added the type/bug The issue is confirmed as a bug. label Feb 2, 2023
@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.0 may-affects-6.1 may-affects-6.2 may-affects-6.3 may-affects-6.4 may-affects-6.5 labels Feb 6, 2023
@seiya-annie seiya-annie removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-6.0 may-affects-6.1 may-affects-6.2 may-affects-6.3 may-affects-6.4 may-affects-6.5 labels Feb 6, 2023
@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

How long does this query take? @hongyunyan

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

I guess this may not be a bug but a slow query.

@hongyunyan
Copy link
Contributor Author

The query never return, so I actually don't know how long the query take (but I think must more than 10 min).
But I think tiflash_segments is not a really large table, I use tiflash to check, it only has throusands rows.
So I don't think it just a slow query( If it just a slow query, maybe we need to check why it so slow?

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

It's weird that a lot of CPU resource is held by ResourceGroupController
ejmiwz3RCD

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

mysql> select @@tidb_enable_resource_control;
+--------------------------------+
| @@tidb_enable_resource_control |
+--------------------------------+
|                              0 |
+--------------------------------+
1 row in set (0.05 sec)

mysql> desc select count(*) from information_schema.tiflash_segments;
+----------------------+----------+------+------------------------+---------------------------+
| id                   | estRows  | task | access object          | operator info             |
+----------------------+----------+------+------------------------+---------------------------+
| HashAgg_5            | 1.00     | root |                        | funcs:count(1)->Column#34 |
| └─MemTableScan_7     | 10000.00 | root | table:TIFLASH_SEGMENTS |                           |
+----------------------+----------+------+------------------------+---------------------------+
2 rows in set (0.04 sec)

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

@tiancaiamao
Copy link
Contributor

It's weird that a lot of CPU resource is held by ResourceGroupController ejmiwz3RCD

Not 'weird', this is a known one #40876

@XuHuaiyu XuHuaiyu added sig/sql-infra SIG: SQL Infra and removed sig/execution SIG execution labels Feb 6, 2023
@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

After patching the commit with the bug fix version, the query is still slow.

A lot of queries like the following snippet shows are built and sent to TiFlash.

[2023/02/06 14:53:03.159 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 2048, 1024"]
[2023/02/06 14:53:03.399 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 3072, 1024"]
[2023/02/06 14:53:03.609 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 4096, 1024"]
[2023/02/06 14:53:03.811 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 5120, 1024"]
[2023/02/06 14:53:04.018 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 6144, 1024"]
[2023/02/06 14:53:04.255 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 7168, 1024"]
[2023/02/06 14:53:04.491 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 8192, 1024"]
[2023/02/06 14:53:04.701 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 9216, 1024"]
[2023/02/06 14:53:04.905 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 10240, 1024"]
[2023/02/06 14:53:04.961 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=37.985375ms]
[2023/02/06 14:53:05.106 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 11264, 1024"]
[2023/02/06 14:53:05.309 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 12288, 1024"]
[2023/02/06 14:53:05.511 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 13312, 1024"]
[2023/02/06 14:53:05.713 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 14336, 1024"]
[2023/02/06 14:53:05.914 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 15360, 1024"]
[2023/02/06 14:53:06.114 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 16384, 1024"]
[2023/02/06 14:53:06.320 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 17408, 1024"]
[2023/02/06 14:53:06.572 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 18432, 1024"]
[2023/02/06 14:53:06.839 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 19456, 1024"]
[2023/02/06 14:53:06.961 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=38.463582ms]
[2023/02/06 14:53:07.125 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 20480, 1024"]
[2023/02/06 14:53:07.403 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 21504, 1024"]
[2023/02/06 14:53:07.678 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 22528, 1024"]
[2023/02/06 14:53:07.957 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 23552, 1024"]
[2023/02/06 14:53:08.203 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 24576, 1024"]
[2023/02/06 14:53:08.441 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 25600, 1024"]
[2023/02/06 14:53:08.682 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 26624, 1024"]
[2023/02/06 14:53:08.925 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 27648, 1024"]
[2023/02/06 14:53:09.036 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=113.510361ms]
[2023/02/06 14:53:09.376 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 28672, 1024"]
[2023/02/06 14:53:09.663 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 29696, 1024"]
[2023/02/06 14:53:09.942 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 30720, 1024"]
[2023/02/06 14:53:10.225 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 31744, 1024"]
[2023/02/06 14:53:10.469 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 32768, 1024"]
[2023/02/06 14:53:10.714 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 33792, 1024"]
[2023/02/06 14:53:10.962 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=38.84022ms]
[2023/02/06 14:53:10.979 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 34816, 1024"]
[2023/02/06 14:53:11.233 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 35840, 1024"]
[2023/02/06 14:53:11.479 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 36864, 1024"]
[2023/02/06 14:53:11.723 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 37888, 1024"]
[2023/02/06 14:53:11.937 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 38912, 1024"]
[2023/02/06 14:53:12.148 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 39936, 1024"]
[2023/02/06 14:53:12.354 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 40960, 1024"]
[2023/02/06 14:53:12.564 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 41984, 1024"]
[2023/02/06 14:53:12.772 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 43008, 1024"]
[2023/02/06 14:53:12.979 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=56.392064ms]
[2023/02/06 14:53:12.993 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 44032, 1024"]
[2023/02/06 14:53:13.204 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 45056, 1024"]
[2023/02/06 14:53:13.416 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 46080, 1024"]
[2023/02/06 14:53:13.628 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 47104, 1024"]
[2023/02/06 14:53:13.923 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 48128, 1024"]
[2023/02/06 14:53:14.210 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 49152, 1024"]
[2023/02/06 14:53:14.493 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 50176, 1024"]
[2023/02/06 14:53:14.773 +08:00] [WARN] [infoschema_reader.go:3184] [query] [query="SELECT * FROM system.dt_segments LIMIT 51200, 1024"]
[2023/02/06 14:53:14.960 +08:00] [WARN] [pd.go:152] ["get timestamp too slow"] ["cost time"=37.217266ms]

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

The logic is implemented in

for {
rows, err := e.dataForTiFlashSystemTables(sctx, e.extractor.TiDBDatabases, e.extractor.TiDBTables)
if err != nil {
return nil, err
}
if len(rows) > 0 || e.instanceIdx >= e.instanceCount {
return rows, nil
}
}

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Feb 6, 2023

After setting maxCount to 102400, query finished quickly:

func (e *TiFlashSystemTableRetriever) dataForTiFlashSystemTables(ctx sessionctx.Context, tidbDatabases string, tidbTables string) ([][]types.Datum, error) {
maxCount := 1024
targetTable := strings.ToLower(strings.Replace(e.table.Name.O, "TIFLASH", "DT", 1))
var filters []string
if len(tidbDatabases) > 0 {
filters = append(filters, fmt.Sprintf("tidb_database IN (%s)", strings.ReplaceAll(tidbDatabases, "\"", "'")))
}
if len(tidbTables) > 0 {
filters = append(filters, fmt.Sprintf("tidb_table IN (%s)", strings.ReplaceAll(tidbTables, "\"", "'")))

tidb> select count(*) from information_schema.tiflash_segments;
+----------+
| count(*) |
+----------+
|     6696 |
+----------+
1 row in set (1.18 sec)

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Feb 9, 2023

Close as pingcap/tiflash#6747 has been fixed by pingcap/tiflash#6745

@JaySon-Huang
Copy link
Contributor

/close

@ti-chi-bot
Copy link
Member

@JaySon-Huang: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

6 participants