Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global sort; pool limiter stuck when import on a 32c64g node #51734

Closed
Tracked by #50752
D3Hunter opened this issue Mar 13, 2024 · 3 comments
Closed
Tracked by #50752

global sort; pool limiter stuck when import on a 32c64g node #51734

D3Hunter opened this issue Mar 13, 2024 · 3 comments
Labels
affects-8.1 This bug affects the 8.1.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. feature/developing the related feature is in development severity/major type/bug The issue is confirmed as a bug.

Comments

@D3Hunter
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

run import with global sort and 32 thread on 32c64g node, on ingest step, some subtask stuck at pool limiter
stack:
stuck-stack.log

2. What did you expect to see? (Required)

import success or fail

3. What did you see instead (Required)

stuck

4. What is your TiDB version? (Required)

master

@D3Hunter D3Hunter added the type/bug The issue is confirmed as a bug. label Mar 13, 2024
@jebter jebter added component/ddl This issue is related to DDL of TiDB. severity/major labels Mar 14, 2024
@D3Hunter D3Hunter added feature/developing the related feature is in development and removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 labels Mar 15, 2024
@ti-chi-bot ti-chi-bot added the affects-8.1 This bug affects the 8.1.x(LTS) versions. label Apr 9, 2024
@lance6716
Copy link
Contributor

hopefully it's closed with the same reason as #52884

@D3Hunter
Copy link
Contributor Author

D3Hunter commented Aug 13, 2024

met again on current master branch, see #55374 too

goroutine 268459959 [chan receive, 793 minutes]:
github.com/pingcap/tidb/br/pkg/membuf.(*Limiter).Acquire(0xc01fbf19f0, 0x100000)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/limiter.go:56 +0x1ab
github.com/pingcap/tidb/br/pkg/membuf.(*Pool).acquire(0xc021209200)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:100 +0x28
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).addBlock(0xc0194121e0)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:302 +0x8b
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).allocBytesWithSliceLocation(0xc0194121e0, 0x19426)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:272 +0x65
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).AllocBytes(0xc0194121e0, 0xc04aca9130?)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:245 +0x29
github.com/pingcap/tidb/br/pkg/membuf.(*Buffer).AddBytes(...)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/br/pkg/membuf/buffer.go:317
github.com/pingcap/tidb/pkg/lightning/backend/external.readOneFile({0x6ce5700, 0xc04aca9130}, {0x6d08a50?, 0xc13346a420?}, {0xc075094b00, 0x38}, {0xc219c8cd68, 0x13, 0x18}, {0xc219c8cd80, ...}, ...)
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/lightning/backend/external/reader.go:186 +0x570
github.com/pingcap/tidb/pkg/lightning/backend/external.readAllData.func2()
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/lightning/backend/external/reader.go:98 +0x3b8
github.com/pingcap/tidb/pkg/lightning/backend/external.readAllData.(*ErrorGroupWithRecover).Go.func3()
	/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/wait_group_wrapper.go:250 +0x58
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 268311633
	/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x96

@lance6716
Copy link
Contributor

For the hotspot files, the data will occupy a lot of memories, which will exceed the memLimiter threshold (12G)

[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/515082f8-c85a-48a2-90b3-aa7536db2d78_stat/1] [startOffset=85385025] [endOffset=702434139] [expectedConc=74] [concurrency=74]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/99287c3d-91f0-4535-8e62-93bac8286d79_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/0109bbcc-08c5-4310-b601-b63649ccddf6_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/a5b9f1ec-c2b0-490c-8a42-82e53dca3265_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/5295178c-09d1-47a6-bd9c-85336a7bfd38_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/6c3c2c39-3e07-4768-8a19-60dfebd49a39_stat/0] [startOffset=617049114] [endOffset=736588149] [expectedConc=15] [concurrency=15]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/6c3c2c39-3e07-4768-8a19-60dfebd49a39_stat/1] [startOffset=0] [endOffset=496371612] [expectedConc=60] [concurrency=60]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e195166c-c860-4790-b240-d6ed5cdcb9f0_stat/1] [startOffset=85385025] [endOffset=736588149] [expectedConc=78] [concurrency=78]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e195166c-c860-4790-b240-d6ed5cdcb9f0_stat/2] [startOffset=0] [endOffset=170770050] [expectedConc=21] [concurrency=21]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/d5bdedc6-1616-42fd-882c-066c61a590c3_stat/1] [startOffset=85385025] [endOffset=702434139] [expectedConc=74] [concurrency=74]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e8df2a22-26c2-4be9-884f-4b1a9b1d506c_stat/0] [startOffset=658033926] [endOffset=736588149] [expectedConc=10] [concurrency=10]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/e8df2a22-26c2-4be9-884f-4b1a9b1d506c_stat/1] [startOffset=0] [endOffset=496371612] [expectedConc=60] [concurrency=60]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/3d5e396d-21ec-4848-9395-7d3336731afa_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.315 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/cb7032c9-762f-4761-96a8-d8cd3fc7609e_stat/1] [startOffset=85385025] [endOffset=508894749] [expectedConc=51] [concurrency=51]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/169d1aff-9061-4dbb-ad82-8cfff2f86d72_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/8ab7b236-e767-4d0c-8c38-0c38bf24ffcb_stat/0] [startOffset=617049114] [endOffset=736588149] [expectedConc=15] [concurrency=15]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/8ab7b236-e767-4d0c-8c38-0c38bf24ffcb_stat/1] [startOffset=0] [endOffset=291447552] [expectedConc=35] [concurrency=35]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/b2b78ac1-2b12-416d-8692-e86e84465cc7_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/ce3779a2-33de-4ff1-856a-72569111a18d_stat/1] [startOffset=85385025] [endOffset=496371612] [expectedConc=49] [concurrency=49]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/0c7db861-6164-4bb1-badd-660805e7256a_stat/1] [startOffset=85385025] [endOffset=736588149] [expectedConc=78] [concurrency=78]
[2024/08/12 21:56:31.316 +08:00] [Info] [engine.go:248] ["found hotspot file in getFilesReadConcurrency"] [filename=60004/120054/data/0c7db861-6164-4bb1-badd-660805e7256a_stat/2] [startOffset=0] [endOffset=170770050] [expectedConc=21] [concurrency=21]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-8.1 This bug affects the 8.1.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. feature/developing the related feature is in development severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

No branches or pull requests

4 participants