Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datum.SetString references the underlying data of chk and causes the memory out of control #35886

Closed
XuHuaiyu opened this issue Jul 1, 2022 · 0 comments · Fixed by #39489
Closed
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@XuHuaiyu
Copy link
Contributor

XuHuaiyu commented Jul 1, 2022

Enhancement

 CREATE TABLE `t_1` (
  `customer_id` varchar(50) NOT NULL,
  `NAME` varchar(50) DEFAULT NULL,
  `phone` varchar(15) DEFAULT NULL,
  `sex` char(1) DEFAULT NULL,
  `birthday` date DEFAULT NULL,
  `card_type` varchar(10) DEFAULT NULL,
  `card_id` varchar(30) DEFAULT NULL,
  `kaihurq` date DEFAULT NULL,
  `unit_id` varchar(6) DEFAULT NULL,
  `dept_id` varchar(6) DEFAULT NULL,
  `vip_level` char(1) DEFAULT NULL,
  `card_num` int(10) unsigned NOT NULL,
  `purse_num` int(10) unsigned NOT NULL,
  PRIMARY KEY (`customer_id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `i1` (`unit_id`),
  KEY `i2` (`phone`,`NAME`,`sex`,`birthday`,`card_id`,`vip_level`,`card_num`,`purse_num`)
);

insert into t_1 values(cast(rand()*10000000 as signed), cast(rand()*10000000 as signed), cast(rand()*10000000 as signed), 0, "2000-12-12", 0, cast(rand()*100000 as signed), "2001-01-01", "123456", "123455",1,123455,123455);

# insert 5000 tuples into t_1;
insert into t_1 select  `customer_id` + cast(rand()*10000000 as signed),  `NAME`,  `phone`,`sex`,`birthday`,`card_type`,`card_id`,`kaihurq`,`unit_id`,`dept_id`,`vip_level`,`card_num`,`purse_num` from t_1;


insert into t_2 select * from t_1;
explain analyze delete t_1, t_2 from t_1, t_2 where t_1.vip_level = t_2.vip_level;

The explain analyze shows that the HashJoin uses at most 1.51MB, but the heap profile shows that HashJoin uses almost 1GB.

mysql> explain analyze delete t_1, t_2 from t_1, t_2 where t_1.vip_level = t_2.vip_level;
+--------------------------------+----------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------+----------+---------+
| id                             | estRows  | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                        | operator info                                                                | memory   | disk    |
+--------------------------------+----------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------+----------+---------+
| Delete_6                       | N/A      | 0        | root      |               | time:1m12.1s, loops:1                                                                                                                                                                                                                                                                                 | N/A                                                                          | 25.2 MB  | N/A     |
| └─HashJoin_9                   | 12487.50 | 25000000 | root      |               | time:363.9ms, loops:24416, build_hash_table:{total:56.4ms, fetch:54.2ms, build:2.19ms}, probe:{concurrency:5, total:5m52.2s, max:1m11.8s, probe:5m51.9s, fetch:289.2ms}                                                                                                                               | inner join, equal:[eq(credit_card.t_1.vip_level, credit_card.t_2.vip_level)] | 1.51 MB  | 0 Bytes |
# .................
+--------------------------------+----------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------+----------+---------+
8 rows in set (1 min 12.23 sec)

截屏2022-07-01 下午6 17 18

After we modify the Datum.SetString as follows:

diff --git a/types/datum.go b/types/datum.go
index 1ad86c770..aa9f7e7f0 100644
--- a/types/datum.go
+++ b/types/datum.go
@@ -244,7 +244,7 @@ func findEncoding(sc *stmtctx.StatementContext, chs string) (enc charset.Encodin
 func (d *Datum) SetString(s string, collation string) {
        d.k = KindString
        sink(s)
-       d.b = hack.Slice(s)
+       d.b = []byte(s)
        d.collation = collation
 }

We can see the memory usage goes down a lot 3GB --> 140MB:
6itRbjyJJB

But the execution time increased from 1min6s to 2min12s.

We need to find a better way to solve this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant