planner: cleanup prepare cache when client send deallocate #8332

lysu · 2018-11-15T11:12:26Z

What problem does this PR solve?

ref #8330, current tidb's ps deallocate handler doesn't release plan-cache memory, this PR fix it.

What is changed and how it works?

delete plan cache when client deallocate a prepared stmt.

in handleStmt for binary protocol
in deallocExec for text protocol "deallocate prepare stmtX"
add some test case

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

use code in issue doesn't saw OOM any more

Code changes

Has exported function/method change
Has interface methods change

Side effects

Increased code complexity

Related changes

Need to cherry-pick to the release branch

Remain Question

current impl simple use current sessionVars, if sql_mode or schema_version are changed before deallocate, then we can not free previous item.

This change is

lysu · 2018-11-15T11:12:40Z

/run-all-tests

lysu · 2018-11-15T11:17:27Z

PTAL @dbjoa, @eurekaka if free, thx~

zz-jason

~~LGTM~~

dbjoa · 2018-11-15T11:37:14Z

@lysu Should we need to delete these obsoleted plans from the plan cache? By virtue of LRU, the plans will be destroyed eventually.

planner/core/cache.go

session/session.go

zz-jason · 2018-11-15T11:33:51Z

planner/core/cache.go

+func (key *pstmtPlanCacheKey) SetPstmtIDSchemaVersion(pstmtID uint32, schemaVersion int64) {
+	key.pstmtID = pstmtID
+	key.schemaVersion = schemaVersion
+	key.hash = nil


It's better to set key.hash = key.hash[:0]. In https://github.com/pingcap/tidb/pull/8332/files#diff-76a70a17b419c3333a3ff060d8f7c330R73:

if len(key.hash) == 0 { // calculate hash value }

key.hash = key.hash[:0] will not release the memory and it's some kind of leak, while key.hash = nil does

@tiancaiamao I reuse this key because it can be reuse if there have more then one key(emm, we can new everytime too), if we keep reuse way [:0] in here is more suitable.

zz-jason · 2018-11-15T11:40:45Z

session/session.go

+				if planCacheEnabled {
+					if i > 0 {
+						cacheKey.(plannercore.PstmtCacheKeyMutator).SetPstmtIDSchemaVersion(
+							stmtID, s.sessionVars.PreparedStmts[stmtID].SchemaVersion,


we only need to reset stmtID?

for normal situation, session env are as same as last exec time, and here we can only got current session vars, so we CAN only reset stmtID here, it works well in most time although do nothing in some corner case(e.g. prepare exec then change sql_mode then dealloc).

zz-jason · 2018-11-15T11:43:34Z

Agreed with @dbjoa . #8330 described a bad usage of the plan cache. For cache, it should not be set too large.

lysu · 2018-11-15T11:56:38Z

Hi, @dbjoa I think it's better to do that, because same stmtID for this connection will never be use, we should remove them just like we remove SessionVars$PreparedStmts, plan tree is more heavy than StmtNode ast tree(much func node, type node..many small object, much gc pressure).

IMHO, LRU capacity should be a protection mechanism that protect max memory usage instead of free memory, LRU only can confirm cache item never overflow capacity but can not make sure item destoryed eventually.

For real product env, we should be hard to set a right lowest capacity value, so there are still some memory will be hold, current plan cache is in connection level, so if a appliaction with a big connection pool or delploy in 100+ instance, it will be much memory in total view even if just some item can not free for each connections.

I thinks it is better release resource if no need them, for this case delete isn't expensive, free memory will be async done by gc.

dbjoa · 2018-11-15T11:58:37Z

In order to fix #8330, we can change util.kvcache to use the byte-size as its capacity instead of the number of elements.

eurekaka · 2018-11-15T12:02:55Z

LRU capacity should be a protection mechanism that protect max memory usage instead of free memory

+1
If we want to limit the memory of kvcache, we have to do bookkeeping for memory usage of cached plan? seems it is not an easy job.

lysu · 2018-11-15T12:06:35Z

@dbjoa, I agree byte-size is better than elements and can solve OOM, but it still keep n bytes no use data per connection and waste resource.

Again, I think LRU capacity should be a protection mechanism

zz-jason · 2018-11-15T12:06:50Z

@dbjoa It's hard to calculate the memory consumed by a cached plan. The present plan cache is in the session level, once there are hundreds of connections, TiDB can still run in OOM with a low cache capacity.

For real product env, we should be hard to set a right lowest capacity value, so there are still some memory will be hold, current plan cache is in connection level, so if a appliaction with a big connection pool or delploy in 100+ instance, it will be much memory in total view even if just some item can not free for each connections.

For this point, I agree with @lysu

I think maybe we can use a global plan cache to fix #8330

eurekaka · 2018-11-15T12:13:01Z

Would global plan cache introduce contention? IIRC, Oracle used to have this problem.
https://yq.aliyun.com/articles/55698?spm=a2c4e.11153940.blogcont55719.4.41dc4c02Qbdunp

zz-jason · 2018-11-15T12:13:06Z

This PR can not fix this situation: The capacity of the session level plan cache is reasonable, but there are a lot of connections execute the prepare, execute, execute, ... operations, without any deallocate operation.

In this situation, the unused cache can not be cleaned by the method introduced in this PR.

dbjoa · 2018-11-15T12:18:22Z

I have concern about the performance of the global plan cache because it needs lock to prevent the race condition. The lock management cost might not be cheap for the CPU intensive workload like compiling query statements.
We can compute the size of an object once when the object is put. In order to know the object size, we can use memory.Sizeof

lysu · 2018-11-15T12:19:54Z

@zz-jason yes, that case we will fixed if user forgot close stmt by follow max_prepared_stmt_count https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_max_prepared_stmt_count
and will send prepared but not deallocate count into grafana, it seems this work is in @tiancaiamao 's roadmap :D

for long-alive connection without close stmt, maybe we need a TTL - -? but I think its by user's purpose didn't close, maybe we'd better doesn't close if total number doesn't over capacity.

executor/prepared_test.go

tiancaiamao · 2018-11-17T06:41:31Z

planner/core/cache.go

+}
+
+// SetPstmtIDSchemaVersion implements PstmtCacheKeyMutator interface to change pstmtID and schemaVersion of cacheKey.
+// so we can reuse Key instead of new every time.


Why we need to reuse Key?

tiancaiamao · 2018-11-17T06:43:47Z

planner/core/cache.go

+func (key *pstmtPlanCacheKey) SetPstmtIDSchemaVersion(pstmtID uint32, schemaVersion int64) {
+	key.pstmtID = pstmtID
+	key.schemaVersion = schemaVersion
+	key.hash = nil


key.hash = key.hash[:0] will not release the memory and it's some kind of leak, while key.hash = nil does

tiancaiamao · 2018-11-17T06:45:50Z

session/session.go

-		for _, stmtID := range retryInfo.DroppedPreparedStmtIDs {
-			delete(s.sessionVars.PreparedStmts, stmtID)
+		if len(retryInfo.DroppedPreparedStmtIDs) > 0 {
+			planCacheEnabled := plannercore.PreparedPlanCacheEnabled()


Will the deallocate prepare stmt1 in the retry history ?

I do some dig for it..

the result is deallocate prepare stmt1 will be added into history, and protocol level stmtClose doesn't..

just as #2473 (comment) said, we only add exec into history, and retry reuse exec, so that PR fix the stmtClose delay stmt distory until retry finished....

but It seems forgot deallocate prepare stmt1, ~~it seems there are a bug in deallocate prepare stmt1 with retry in master code~~

@tiancaiamao I retrid:

for prepare stmt1 from 'xx' and deallocate prepare stmt1 will add history, and will reprepare and close for every retry, so no need lazy clean, and it's unnormal for people use prepare in text protocol, maybe is ok.

but for binary stmtPrepare and stmtClose will NOT add to history, so need lazy cleanup at here

summary: it seems no problem.

retry prepare stmt1 from 'xx' will prepare twice, and retry deallocate prepare stmt1 will dealloc twice?
What will happen in the plan cache ?

after this PR will it will add plan cache twice and remove from plan cache twice...

I suddenly found there are maybe have question when a "prepare stmt from xx" in a transaction but without dealloc, retry will make many useless stmt in server, maybe handleStmtPrepare's way is better, we should use a unified way to handle this two entrances.

util/kvcache/simple_lru.go

tiancaiamao · 2018-11-19T17:48:02Z

planner/core/cache.go

@@ -70,12 +70,14 @@ type pstmtPlanCacheKey struct {

 // Hash implements Key interface.
 func (key *pstmtPlanCacheKey) Hash() []byte {


By the way, how about refactor kv.SimpleCache use []byte as key ? (not in this pr)
I don't see any benefit of its Key definition. @lysu

tiancaiamao · 2018-11-19T17:48:31Z

LGTM

eurekaka

LGTM

eurekaka · 2018-11-20T02:58:43Z

/run-all-tests

…8332)" This reverts commit 61ee0da.

lysu added type/bugfix This PR fixes a bug. sig/planner SIG: Planner labels Nov 15, 2018

zz-jason reviewed Nov 15, 2018

View reviewed changes

tiancaiamao reviewed Nov 17, 2018

View reviewed changes

lysu added 5 commits November 19, 2018 12:33

planner: cleanup prepare cache when client send dealloc

7604723

planner: fix close connection panic

2d97460

planner: add test case and make deallocExec works

d6884ea

address comments

b8ec5df

address comments

f86bd69

tiancaiamao reviewed Nov 19, 2018

View reviewed changes

tiancaiamao added the status/LGT1 Indicates that a PR has LGTM 1. label Nov 19, 2018

eurekaka approved these changes Nov 20, 2018

View reviewed changes

eurekaka added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Nov 20, 2018

Merge branch 'master' into stmt-dealloc

48d2d67

tiancaiamao added the status/all tests passed label Nov 20, 2018

Merge branch 'master' into stmt-dealloc

ec8fb74

tiancaiamao merged commit 61ee0da into pingcap:master Nov 20, 2018

lysu added a commit that referenced this pull request Nov 20, 2018

Revert " planner: cleanup prepare cache when client send deallocate (#…

41819b2

…8332)" This reverts commit 61ee0da.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner: cleanup prepare cache when client send deallocate #8332

planner: cleanup prepare cache when client send deallocate #8332

lysu commented Nov 15, 2018 •

edited

Loading

lysu commented Nov 15, 2018

lysu commented Nov 15, 2018 •

edited

Loading

zz-jason left a comment •

edited

Loading

dbjoa commented Nov 15, 2018

zz-jason Nov 15, 2018

tiancaiamao Nov 17, 2018

lysu Nov 19, 2018

zz-jason Nov 15, 2018

lysu Nov 19, 2018 •

edited

Loading

zz-jason commented Nov 15, 2018

lysu commented Nov 15, 2018 •

edited

Loading

dbjoa commented Nov 15, 2018

eurekaka commented Nov 15, 2018

lysu commented Nov 15, 2018

zz-jason commented Nov 15, 2018

eurekaka commented Nov 15, 2018

zz-jason commented Nov 15, 2018 •

edited

Loading

dbjoa commented Nov 15, 2018

lysu commented Nov 15, 2018 •

edited

Loading

tiancaiamao Nov 17, 2018

tiancaiamao Nov 17, 2018

tiancaiamao Nov 17, 2018

lysu Nov 19, 2018 •

edited

Loading

lysu Nov 19, 2018

tiancaiamao Nov 19, 2018

lysu Nov 19, 2018

tiancaiamao Nov 19, 2018

tiancaiamao commented Nov 19, 2018

eurekaka left a comment

eurekaka commented Nov 20, 2018

		@@ -70,12 +70,14 @@ type pstmtPlanCacheKey struct {

		// Hash implements Key interface.
		func (key *pstmtPlanCacheKey) Hash() []byte {

planner: cleanup prepare cache when client send deallocate #8332

planner: cleanup prepare cache when client send deallocate #8332

Conversation

lysu commented Nov 15, 2018 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

Remain Question

lysu commented Nov 15, 2018

lysu commented Nov 15, 2018 • edited Loading

zz-jason left a comment • edited Loading

Choose a reason for hiding this comment

dbjoa commented Nov 15, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lysu Nov 19, 2018 • edited Loading

Choose a reason for hiding this comment

zz-jason commented Nov 15, 2018

lysu commented Nov 15, 2018 • edited Loading

dbjoa commented Nov 15, 2018

eurekaka commented Nov 15, 2018

lysu commented Nov 15, 2018

zz-jason commented Nov 15, 2018

eurekaka commented Nov 15, 2018

zz-jason commented Nov 15, 2018 • edited Loading

dbjoa commented Nov 15, 2018

lysu commented Nov 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lysu Nov 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiancaiamao commented Nov 19, 2018

eurekaka left a comment

Choose a reason for hiding this comment

eurekaka commented Nov 20, 2018

lysu commented Nov 15, 2018 •

edited

Loading

lysu commented Nov 15, 2018 •

edited

Loading

zz-jason left a comment •

edited

Loading

lysu Nov 19, 2018 •

edited

Loading

lysu commented Nov 15, 2018 •

edited

Loading

zz-jason commented Nov 15, 2018 •

edited

Loading

lysu commented Nov 15, 2018 •

edited

Loading

lysu Nov 19, 2018 •

edited

Loading