Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query killed by global memory controller unexpectedly #42662

Closed
XuHuaiyu opened this issue Mar 29, 2023 · 5 comments · Fixed by #42803 or #43089
Closed

query killed by global memory controller unexpectedly #42662

XuHuaiyu opened this issue Mar 29, 2023 · 5 comments · Fixed by #42803 or #43089
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.

Comments

@XuHuaiyu
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Global memory controller killed a query successfully on the connection 8733482432075924065
    BP6SSjoL1U

  2. But all the new queries sent to this connection are killed
    BRRhJeLE9u

I guess the NeedKill signal was not set to false on this connection.

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

["Release Version"=v6.5.0-20230228] [Edition=Enterprise] ["Git Commit Hash"=58fcf7b58ad717b61c3deeff0764f3a47246c5ed] ["Git Branch"=heads/refs/tags/v6.5.0-20230228] ["UTC Build Time"="2023-02-28 11:07:46"] [GoVersion=go1.19.5] ["Race Enabled"=false] ["Check Table Before Drop"=false] ["TiKV Min Version"=6.2.0-alpha]

@XuHuaiyu XuHuaiyu added the type/bug The issue is confirmed as a bug. label Mar 29, 2023
@XuHuaiyu
Copy link
Contributor Author

XuHuaiyu commented Mar 29, 2023

The queries might be executed with ComPrepareStmt and ComExecuteStmt

This case has nothing to do with prepare and execute

@yibin87
Copy link
Contributor

yibin87 commented Mar 29, 2023

FYI, Tried go sql driver with prepared stmt, like this:

	db, err := sql.Open("mysql", "root@tcp(localhost:4000)/tpch1")
	if err != nil {
		fmt.Printf("Connection failed: %s\n", err)
		return
	}

	stmt, err := db.Prepare("SELECT /*+ HASH_JOIN(l1, l2) */ count(*) FROM lineitem l1, lineitem l2 WHERE l1.l_orderkey > l2.l_orderkey and l1.l_suppkey < ?")
	if err != nil {
		log.Fatal(err)
	}

	// Execute the prepared statement, passing in an id value for the
	// parameter whose placeholder is ?
	_, err = stmt.Query(4000000)
	if err != nil {
		fmt.Println(err)
	}

	_, err = stmt.Query(4000000)
	if err != nil {
		log.Fatal(err)
	}

	_, err = stmt.Query(0)
	if err != nil {
		log.Fatal(err)
	}

	_, err = db.Query("SELECT count(*) FROM orders")
	if err != nil {
		log.Fatal(err)
	}

After the previous two queries failed due to "Out Of Memory Quota", the following ones executed successfully.

@yibin87
Copy link
Contributor

yibin87 commented Mar 29, 2023

Confirmed that it is killed by global memory controller:
[2023/03/29 15:39:28.869 +08:00] [WARN] [servermemorylimit.go:126] ["global memory controller tries to kill the top1 memory consumer"] [conn=7901488580407712037] ["sql digest"=e1d9ed99007654b7a983e03897215ad85e7a757a064e42ef4ce6214d82943453] ["sql text"="SELECT /*+ HASH_JOIN(l1, l2) / count() FROM lineitem l1, lineitem l2 WHERE l1.l_orderkey > l2.l_or"] [tidb_server_memory_limit=536870912] ["heap inuse"=3450462208] ["sql memory usage"=134348142]

@yibin87
Copy link
Contributor

yibin87 commented Mar 30, 2023

Confirmed that both this issue and #42664 can be reproduced by following steps:

  1. Prepare data using tpch, sf = 1, no tiflash replica
  2. Set tidb_server_memory_limit to about 1.6GB
  3. Starts a new mysql connection, execute:
PREPARE `books_query` FROM 'SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < ? union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1000 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1001 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1002 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1003 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1004 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1005 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1006 union all SELECT /*+ HASH_JOIN(o1, o2) */ * FROM orders o1, orders o2 WHERE o1.o_orderkey > o2.o_orderkey and o1.o_orderkey < 1007';
  1. execute SET @id = 1; EXECUTE `books_query` USING @id;, and wait it to finish successfully, ensure its memory usage exceeds sess limit
  2. Then, starts many small connections and ensure the total tidb-server process takes more than 1.6GB.
  3. In previous connection that runs prepare/execute statement, run some simple queries like select count(*) from orders, it would be killed, and report message like this issue and query with memory usage 0 be killed #42664

@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels Mar 31, 2023
@XuHuaiyu XuHuaiyu added affects-6.5 This bug affects the 6.5.x(LTS) versions. may-affects-6.5 and removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels Mar 31, 2023
@XuHuaiyu
Copy link
Contributor Author

The trigger for this issue is caused by the following reasons:

  • The global memory controller always holds a pointer to the session-0 with the highest memory consumption. When the query on session-0 ends, the pointer is not set to null.
  • Other sessions execute small queries concurrently, triggering the global memory controller's query killing mechanism. Due to the presence of issue 1, the global memory controller kills the query corresponding to session-0 based on the pointer. At this point, the NeedKill flag on session-0 is set to true.
  • Normally, when the query exits (whether normal or abnormal), the NeedKill flag is set to false. However, this case triggers another issue that causes NeedKill to not be set to false:
    • The query on session-0 that was killed due to memory pressure was started and immediately killed. When the killing action occurs during the Open call of the operator or in the code path before that, the ResultSet.Close method is not called, resulting in NeedKill not being set to false.
  • Due to the combined effect of issues 1 and 3, this case occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. severity/major sig/execution SIG execution type/bug The issue is confirmed as a bug.
Projects
None yet
4 participants