Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8% tpcc performance degradation caused by #29238 #29850

Closed
dbsid opened this issue Nov 16, 2021 · 6 comments · Fixed by #29859
Closed

8% tpcc performance degradation caused by #29238 #29850

dbsid opened this issue Nov 16, 2021 · 6 comments · Fixed by #29859
Assignees
Labels
epic/plan-cache severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@dbsid
Copy link
Contributor

dbsid commented Nov 16, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

run tpcc workload on aws ec2 config

2. What did you expect to see? (Required)

below prepared statement, all the predicates are all equal =, the optimizer should be able to choose the point get plan. kv_get is cheaper than cop in tikv.

SELECT
  `c_discount`,
  `c_last`,
  `c_credit`,
  `w_tax`
FROM
  `customer`,
  `warehouse`
WHERE
  `w_id` = ?
  AND `c_w_id` = `w_id`
  AND `c_d_id` = ?
  AND `c_id` = ?
	id              	task	estRows	operator info                                                                               	actRows	execution info                                                                                                                                                                                                                                                        	memory 	disk
	Projection_7    	root	0.00   	tpcc.customer.c_discount, tpcc.customer.c_last, tpcc.customer.c_credit, tpcc.warehouse.w_tax	1      	time:11.2ms, loops:2, Concurrency:OFF                                                                                                                                                                                                                                 	4.02 KB	N/A
	└─HashJoin_8    	root	0.00   	CARTESIAN inner join                                                                        	1      	time:11.2ms, loops:2, build_hash_table:{total:4.26ms, fetch:4.25ms, build:8.73µs}, probe:{concurrency:5, total:36.1ms, max:9.55ms, probe:2.5ms, fetch:33.6ms}                                                                                                        	48.5 KB	0 Bytes
	  ├─Point_Get_10	root	1      	table:customer, clustered index:PRIMARY(c_w_id, c_d_id, c_id)                               	1      	time:4.83ms, loops:2,                                                                                                                                                                                                                                                 	N/A    	N/A
	  └─Point_Get_11	root	1      	table:warehouse, handle:4220                                                                	1      	time:4.16ms, loops:2, Get:{num_rpc:2, total_time:5.14ms}, scan_detail: {total_process_keys: 2, total_process_keys_size: 720, total_keys: 2, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 27, read_count: 1, read_byte: 41.7 KB}}}	N/A    	N/A

3. What did you see instead (Required)

tablereader and tablerangescan
image

4. What is your TiDB version? (Required)

v5.3.0

@dbsid dbsid added type/bug The issue is confirmed as a bug. severity/major labels Nov 16, 2021
@AilinKid
Copy link
Contributor

AilinKid commented Nov 16, 2021

@dbsid can you provide the what the tpcc workload is?

@winoros
Copy link
Member

winoros commented Nov 16, 2021

This one is a little strange. It's very surprising that that pr influenced this case.

@winoros winoros changed the title 8% tpcc performance degradation caused by pr#29238 8% tpcc performance degradation caused by #29238 Nov 16, 2021
@dbsid
Copy link
Contributor Author

dbsid commented Nov 16, 2021

steps to produce

CREATE TABLE `customer` (
  `c_id` int(11) NOT NULL,
  `c_d_id` int(11) NOT NULL,
  `c_w_id` int(11) NOT NULL,
  `c_first` varchar(16) DEFAULT NULL,
  `c_middle` char(2) DEFAULT NULL,
  `c_last` varchar(16) DEFAULT NULL,
  `c_street_1` varchar(20) DEFAULT NULL,
  `c_street_2` varchar(20) DEFAULT NULL,
  `c_city` varchar(20) DEFAULT NULL,
  `c_state` char(2) DEFAULT NULL,
  `c_zip` char(9) DEFAULT NULL,
  `c_phone` char(16) DEFAULT NULL,
  `c_since` datetime DEFAULT NULL,
  `c_credit` char(2) DEFAULT NULL,
  `c_credit_lim` decimal(12,2) DEFAULT NULL,
  `c_discount` decimal(4,4) DEFAULT NULL,
  `c_balance` decimal(12,2) DEFAULT NULL,
  `c_ytd_payment` decimal(12,2) DEFAULT NULL,
  `c_payment_cnt` int(11) DEFAULT NULL,
  `c_delivery_cnt` int(11) DEFAULT NULL,
  `c_data` varchar(500) DEFAULT NULL,
  PRIMARY KEY (`c_w_id`,`c_d_id`,`c_id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_customer` (`c_w_id`,`c_d_id`,`c_last`,`c_first`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
CREATE TABLE `warehouse` (
  `w_id` int(11) NOT NULL,
  `w_name` varchar(10) DEFAULT NULL,
  `w_street_1` varchar(20) DEFAULT NULL,
  `w_street_2` varchar(20) DEFAULT NULL,
  `w_city` varchar(20) DEFAULT NULL,
  `w_state` char(2) DEFAULT NULL,
  `w_zip` char(9) DEFAULT NULL,
  `w_tax` decimal(4,4) DEFAULT NULL,
  `w_ytd` decimal(12,2) DEFAULT NULL,
  PRIMARY KEY (`w_id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
set @w_id=1262;
set @c_d_id=7;
set @c_id=1549;
prepare stmt from 'SELECT
  `c_discount`,
  `c_last`,
  `c_credit`,
  `w_tax`
FROM
  `customer`,
  `warehouse`
WHERE
  `w_id` = ?
  AND `c_w_id` = `w_id`
  AND `c_d_id` = ?
  AND `c_id` = ?';
execute stmt using @w_id, @c_d_id, @c_id;
explain for connection 17;

@qw4990
Copy link
Contributor

qw4990 commented Nov 17, 2021

The reason why it cannot use PointGet is caused by #28478, and it's expected, I think I've already notified this risk in our chat group when developing it.
PointGet might contain some over-optimized transformation, for example, a>=1 and a<=1 --> a=1, so for safety, we forbid using PointGet in some complex scenarios when using plan cache.

Seeing as the TableRangeScan actually has the same range(point) with the original PointGet, I doubt whether this change can cause 8% tpcc perf-regression.
And it seems a little weird that we depend on some non-GA feature when doing the TPCC benchmark ...

@qw4990
Copy link
Contributor

qw4990 commented Nov 17, 2021

I'm investigating this issue and seeing whether there is any simple method to fix it.

@github-actions
Copy link

Please check whether the issue should be labeled with 'affects-x.y' or 'fixes-x.y.z', and then remove 'needs-more-info' label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic/plan-cache severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants