Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement](cloud) support TTL file cache evict through LRU #37312

Merged
merged 7 commits into from
Jul 12, 2024

Conversation

freemandealer
Copy link
Contributor

@freemandealer freemandealer commented Jul 4, 2024

Motivation and Basic Ideas

Originally, the TTL file cache only evicts when it expires. If TTL data fills the cache, new TTL data won't fit in the cache and thus switch to SKIP_CACHE mode, which is a performance killer that overruns the S3 downloader with tons of small IOs.

This commit enables evicting the TTL cache actively through LRU beside the original TTL expiration.

Performance tests

We can set up a scenario where the TTL cache is full by using a small file cache space (e.g. 5GB). We use table comsumer_ttl in the regression test attached with the PR and load data several times (e.g. 20GB) to populate the TTL cache.

With this setting, we execute query select count(*) from customer_ttl where C_ADDRESS like '%ea%' and C_NAME like '%a%' and C_COMMENT like '%b%' separately while enable_ttl_cache_evict_using_lru = true/false in the be conf. The results shows that the performance is increased by 43x during the tests:

ttl_cache_evict_using_lru feature enabled disabled (baseline)
query execution time 8.5s 342s

Note that the result heavily depends on the ratio of cache size and the amount of data that the query incurs, along with S3 server performance, etc, so the result is only here for reference.

Such improvement is achieved by the active eviction of the TTL cache , which effectively reduces the possibility of SKIP_CACHE. During the tests:

ttl_cache_evict_using_lru feature enabled disabled (baseline)
SKIP_CACHE occurence 0 257.1K

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

github-actions bot commented Jul 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

github-actions bot commented Jul 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

@dataroaring
Copy link
Contributor

run buildall

@@ -796,6 +818,70 @@ bool BlockFileCache::try_reserve_for_ttl(size_t size, std::lock_guard<std::mutex
return true;
}

bool BlockFileCache::try_reserve_for_ttl(size_t size, std::lock_guard<std::mutex>& cache_lock) {
if (try_reserve_for_ttl_without_lru(size, cache_lock)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!try_reserve_for_ttl_without_lru(size, cache_lock) && config::enable_ttl_cache_evict_using_lru) {
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot see the improvement of code correctness and readability of such change

std::lock_guard block_lock(file_block->_mutex);
auto hash = cell->file_block->get_hash_value();
remove(file_block, cache_lock, block_lock);
if (_files.find(hash) == _files.end()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it dont need to do this

Copy link
Contributor Author

@freemandealer freemandealer Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it dont need to do this

why?

be/src/io/cache/block_file_cache.cpp Outdated Show resolved Hide resolved
@doris-robot
Copy link

TPC-H: Total hot run time: 40037 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8000c3b747248875d9a03aa2e905fe359a05f531, data reload: false

------ Round 1 ----------------------------------
q1	17600	4493	4408	4408
q2	2017	190	189	189
q3	10449	1318	1178	1178
q4	10195	741	828	741
q5	7518	2710	2632	2632
q6	217	132	134	132
q7	958	594	604	594
q8	9220	2094	2063	2063
q9	8791	6591	6521	6521
q10	9035	3788	3755	3755
q11	458	237	230	230
q12	458	248	221	221
q13	17903	3019	2981	2981
q14	280	231	238	231
q15	527	496	482	482
q16	515	379	370	370
q17	981	681	634	634
q18	8075	7461	7498	7461
q19	6633	1470	1473	1470
q20	676	321	325	321
q21	4895	3093	3279	3093
q22	382	330	337	330
Total cold run time: 117783 ms
Total hot run time: 40037 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4401	4247	4262	4247
q2	379	262	279	262
q3	2981	2956	2914	2914
q4	2000	1674	1767	1674
q5	5671	5487	5526	5487
q6	226	131	132	131
q7	2220	1935	1901	1901
q8	3323	3447	3462	3447
q9	8715	8731	8788	8731
q10	4128	3852	3847	3847
q11	585	532	502	502
q12	819	674	650	650
q13	16236	3151	3207	3151
q14	311	274	280	274
q15	532	496	488	488
q16	485	434	439	434
q17	1826	1541	1519	1519
q18	8185	8057	7690	7690
q19	4433	1582	1681	1582
q20	2924	1894	1875	1875
q21	5241	4971	4817	4817
q22	681	562	561	561
Total cold run time: 76302 ms
Total hot run time: 56184 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174600 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8000c3b747248875d9a03aa2e905fe359a05f531, data reload: false

query1	916	374	374	374
query2	6455	2272	2329	2272
query3	6637	208	213	208
query4	22065	17724	17485	17485
query5	3649	488	498	488
query6	250	170	185	170
query7	4574	287	287	287
query8	316	306	297	297
query9	8512	2385	2373	2373
query10	589	332	303	303
query11	11753	10132	10103	10103
query12	119	86	86	86
query13	1651	378	387	378
query14	9301	7778	7436	7436
query15	222	185	184	184
query16	7759	313	305	305
query17	1798	543	523	523
query18	1840	280	267	267
query19	197	144	153	144
query20	88	82	81	81
query21	213	150	134	134
query22	4368	4119	4219	4119
query23	34152	33676	33640	33640
query24	11163	2855	2840	2840
query25	608	393	376	376
query26	719	153	151	151
query27	2258	276	286	276
query28	6557	2175	2147	2147
query29	899	661	636	636
query30	252	163	156	156
query31	969	774	754	754
query32	101	60	57	57
query33	776	294	297	294
query34	1001	509	501	501
query35	693	589	597	589
query36	1127	981	980	980
query37	152	87	84	84
query38	2951	2838	2806	2806
query39	879	806	797	797
query40	201	121	116	116
query41	56	50	53	50
query42	115	101	98	98
query43	577	546	544	544
query44	1206	735	732	732
query45	194	162	160	160
query46	1104	737	716	716
query47	1863	1774	1760	1760
query48	382	289	294	289
query49	828	415	428	415
query50	772	390	391	390
query51	6885	6910	6775	6775
query52	107	91	91	91
query53	357	296	296	296
query54	883	450	452	450
query55	75	75	75	75
query56	287	256	268	256
query57	1124	1098	1062	1062
query58	236	262	248	248
query59	3182	3283	3143	3143
query60	321	280	282	280
query61	96	95	95	95
query62	789	644	646	644
query63	332	294	303	294
query64	9340	2197	1603	1603
query65	3148	3115	3085	3085
query66	713	323	322	322
query67	15718	14852	14942	14852
query68	6699	551	546	546
query69	714	419	357	357
query70	1171	1091	1152	1091
query71	489	282	276	276
query72	7970	5543	5732	5543
query73	781	334	319	319
query74	5904	5566	5514	5514
query75	4365	2690	2744	2690
query76	4550	990	936	936
query77	779	299	310	299
query78	9384	9043	8878	8878
query79	2885	517	528	517
query80	2228	474	536	474
query81	575	219	225	219
query82	596	138	141	138
query83	274	185	176	176
query84	280	92	86	86
query85	1271	324	312	312
query86	424	326	341	326
query87	3358	3120	3143	3120
query88	3623	2379	2384	2379
query89	495	394	374	374
query90	1888	191	187	187
query91	129	102	104	102
query92	59	50	49	49
query93	4728	522	504	504
query94	1105	208	212	208
query95	408	306	312	306
query96	599	269	273	269
query97	3201	3013	3053	3013
query98	207	204	197	197
query99	1527	1256	1253	1253
Total cold run time: 282086 ms
Total hot run time: 174600 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8000c3b747248875d9a03aa2e905fe359a05f531, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.22	0.04	0.05
query4	1.67	0.09	0.09
query5	0.50	0.48	0.50
query6	1.14	0.73	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.54	0.49	0.49
query10	0.54	0.55	0.54
query11	0.14	0.11	0.12
query12	0.15	0.12	0.12
query13	0.59	0.58	0.58
query14	0.76	0.79	0.77
query15	0.86	0.83	0.81
query16	0.37	0.37	0.36
query17	1.04	0.95	1.06
query18	0.23	0.22	0.21
query19	1.90	1.75	1.84
query20	0.01	0.01	0.01
query21	15.39	0.75	0.64
query22	4.77	7.45	1.82
query23	18.29	1.41	1.14
query24	2.13	0.24	0.21
query25	0.16	0.09	0.09
query26	0.30	0.22	0.22
query27	0.45	0.23	0.23
query28	13.26	1.01	0.99
query29	12.63	3.31	3.30
query30	0.25	0.06	0.05
query31	2.87	0.40	0.39
query32	3.25	0.48	0.48
query33	2.89	2.98	2.93
query34	16.91	4.37	4.42
query35	4.48	4.42	4.51
query36	0.64	0.49	0.46
query37	0.19	0.16	0.15
query38	0.15	0.15	0.14
query39	0.04	0.04	0.04
query40	0.16	0.13	0.13
query41	0.09	0.04	0.05
query42	0.05	0.05	0.04
query43	0.04	0.04	0.03
Total cold run time: 110.23 s
Total hot run time: 30.46 s

@freemandealer
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Jul 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Jul 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@freemandealer
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Jul 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Jul 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@freemandealer
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Jul 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39860 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3e5cd77f67a97168e8526115f3b079429a856119, data reload: false

------ Round 1 ----------------------------------
q1	18370	4742	4304	4304
q2	2023	190	185	185
q3	10547	1162	1071	1071
q4	10239	743	857	743
q5	7597	2700	2692	2692
q6	219	141	143	141
q7	965	605	613	605
q8	9234	2095	2125	2095
q9	8901	6501	6505	6501
q10	8934	3713	3719	3713
q11	465	248	234	234
q12	441	225	223	223
q13	17934	2972	2980	2972
q14	259	221	230	221
q15	531	486	481	481
q16	501	380	370	370
q17	981	683	684	683
q18	8059	7439	7312	7312
q19	4975	1497	1481	1481
q20	654	317	332	317
q21	5015	3180	3835	3180
q22	392	339	336	336
Total cold run time: 117236 ms
Total hot run time: 39860 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4415	4284	4239	4239
q2	364	273	262	262
q3	2983	2962	2993	2962
q4	1995	1697	1784	1697
q5	5566	5548	5431	5431
q6	226	132	135	132
q7	2303	1908	1828	1828
q8	3268	3462	3425	3425
q9	8758	8842	8828	8828
q10	4138	3681	3853	3681
q11	598	506	523	506
q12	827	653	657	653
q13	16380	3190	3181	3181
q14	308	290	284	284
q15	545	491	490	490
q16	507	454	428	428
q17	1802	1542	1515	1515
q18	8265	8034	7934	7934
q19	1835	1594	1600	1594
q20	2253	1874	1877	1874
q21	5019	4668	4901	4668
q22	601	547	578	547
Total cold run time: 72956 ms
Total hot run time: 56159 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173908 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3e5cd77f67a97168e8526115f3b079429a856119, data reload: false

query1	927	375	379	375
query2	6426	2371	2460	2371
query3	6651	206	220	206
query4	23423	17523	17053	17053
query5	3800	468	487	468
query6	264	173	168	168
query7	4586	306	293	293
query8	321	302	318	302
query9	8653	2367	2359	2359
query10	443	290	285	285
query11	11764	10063	9967	9967
query12	117	87	82	82
query13	1665	380	377	377
query14	10130	7837	7608	7608
query15	240	184	192	184
query16	7669	316	306	306
query17	1822	558	537	537
query18	1704	272	266	266
query19	205	145	146	145
query20	91	79	81	79
query21	206	122	127	122
query22	4342	4157	3906	3906
query23	34048	33847	33734	33734
query24	9946	2774	2949	2774
query25	598	416	413	413
query26	695	154	149	149
query27	2196	271	272	271
query28	5992	2120	2113	2113
query29	919	640	620	620
query30	259	155	154	154
query31	975	777	760	760
query32	98	55	56	55
query33	662	303	304	303
query34	909	500	490	490
query35	692	566	574	566
query36	1091	985	1024	985
query37	150	85	91	85
query38	2988	2895	2797	2797
query39	917	822	803	803
query40	199	120	116	116
query41	54	51	52	51
query42	122	104	100	100
query43	600	547	551	547
query44	1101	731	720	720
query45	191	189	165	165
query46	1064	720	706	706
query47	1853	1746	1808	1746
query48	372	298	304	298
query49	843	407	432	407
query50	768	389	394	389
query51	6892	6737	6748	6737
query52	105	91	89	89
query53	355	287	281	281
query54	877	448	450	448
query55	76	77	73	73
query56	298	264	267	264
query57	1120	1047	1051	1047
query58	261	272	257	257
query59	3492	3328	3281	3281
query60	314	276	276	276
query61	98	97	104	97
query62	787	619	633	619
query63	320	291	291	291
query64	9200	2168	1616	1616
query65	3169	3115	3123	3115
query66	748	324	328	324
query67	15361	15186	14933	14933
query68	4461	531	529	529
query69	593	395	323	323
query70	1107	1179	1132	1132
query71	373	284	275	275
query72	6902	5165	5592	5165
query73	737	328	320	320
query74	5891	5489	5557	5489
query75	3361	2707	2686	2686
query76	2375	1024	921	921
query77	449	303	299	299
query78	9451	9000	9968	9000
query79	2597	505	510	505
query80	1201	484	537	484
query81	585	217	223	217
query82	775	136	133	133
query83	233	173	171	171
query84	240	87	92	87
query85	1677	323	312	312
query86	482	320	314	314
query87	3239	3069	3105	3069
query88	4073	2441	2446	2441
query89	479	390	365	365
query90	1763	196	187	187
query91	134	105	100	100
query92	58	49	50	49
query93	2397	496	498	496
query94	1142	212	213	212
query95	409	310	321	310
query96	594	279	269	269
query97	3156	3075	3069	3069
query98	234	204	202	202
query99	1600	1253	1271	1253
Total cold run time: 271992 ms
Total hot run time: 173908 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.9 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3e5cd77f67a97168e8526115f3b079429a856119, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.06	0.07
query5	0.49	0.49	0.49
query6	1.17	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.55	0.50	0.47
query10	0.55	0.53	0.54
query11	0.14	0.11	0.12
query12	0.14	0.12	0.13
query13	0.60	0.58	0.58
query14	0.77	0.78	0.77
query15	0.85	0.82	0.81
query16	0.36	0.34	0.34
query17	1.00	0.98	0.97
query18	0.24	0.22	0.23
query19	1.84	1.67	1.71
query20	0.01	0.01	0.01
query21	15.39	0.77	0.66
query22	4.06	8.02	2.26
query23	18.27	1.34	1.21
query24	2.11	0.23	0.22
query25	0.15	0.08	0.08
query26	0.29	0.21	0.21
query27	0.45	0.22	0.22
query28	13.31	1.01	0.98
query29	12.59	3.34	3.31
query30	0.25	0.07	0.05
query31	2.87	0.39	0.39
query32	3.28	0.48	0.47
query33	2.89	2.94	2.90
query34	17.13	4.44	4.40
query35	4.39	4.45	4.46
query36	0.65	0.48	0.47
query37	0.18	0.15	0.17
query38	0.16	0.14	0.14
query39	0.04	0.04	0.03
query40	0.15	0.12	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.61 s
Total hot run time: 30.9 s

@freemandealer
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40549 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7463b97aa1c6e0dc44c96ba1c9dc735c73e70171, data reload: false

------ Round 1 ----------------------------------
q1	17610	4434	4318	4318
q2	2012	196	196	196
q3	10437	1156	1125	1125
q4	10190	754	788	754
q5	7484	2640	2615	2615
q6	221	137	136	136
q7	958	592	606	592
q8	9239	2088	2083	2083
q9	8882	6500	6559	6500
q10	8999	3809	3733	3733
q11	449	241	230	230
q12	531	232	239	232
q13	17766	2997	3025	2997
q14	260	223	232	223
q15	521	489	477	477
q16	506	382	378	378
q17	981	632	646	632
q18	8109	7608	7334	7334
q19	3081	1491	1402	1402
q20	653	320	329	320
q21	5089	3936	3975	3936
q22	396	345	336	336
Total cold run time: 114374 ms
Total hot run time: 40549 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4371	4288	4262	4262
q2	360	267	270	267
q3	3022	2778	2908	2778
q4	1996	1726	1699	1699
q5	5652	5511	5516	5511
q6	225	130	131	130
q7	2194	1913	1913	1913
q8	3297	3433	3433	3433
q9	8680	8762	8841	8762
q10	4031	3865	3688	3688
q11	570	504	498	498
q12	843	652	620	620
q13	15971	3246	3222	3222
q14	304	278	313	278
q15	549	474	495	474
q16	476	420	425	420
q17	1823	1510	1514	1510
q18	8416	8002	7981	7981
q19	1878	1683	1565	1565
q20	2091	1881	1840	1840
q21	5218	4951	4953	4951
q22	643	571	568	568
Total cold run time: 72610 ms
Total hot run time: 56370 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173135 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7463b97aa1c6e0dc44c96ba1c9dc735c73e70171, data reload: false

query1	913	387	363	363
query2	6451	2632	2395	2395
query3	6629	207	212	207
query4	27989	17544	17274	17274
query5	3566	472	469	469
query6	262	178	168	168
query7	4577	280	277	277
query8	317	308	310	308
query9	8499	2383	2366	2366
query10	439	283	275	275
query11	11742	9887	9981	9887
query12	116	92	82	82
query13	1652	385	373	373
query14	10388	7037	7645	7037
query15	248	189	186	186
query16	7901	308	298	298
query17	1835	556	527	527
query18	2036	277	275	275
query19	198	146	147	146
query20	88	84	80	80
query21	210	128	124	124
query22	4280	4043	3968	3968
query23	34501	33879	33343	33343
query24	10398	2820	2857	2820
query25	607	421	386	386
query26	699	153	147	147
query27	2219	273	282	273
query28	6067	2122	2134	2122
query29	902	623	641	623
query30	254	155	149	149
query31	996	763	781	763
query32	92	56	54	54
query33	661	289	298	289
query34	898	502	493	493
query35	691	554	585	554
query36	1101	978	981	978
query37	139	87	84	84
query38	3069	2888	2806	2806
query39	901	848	855	848
query40	199	120	118	118
query41	53	52	51	51
query42	118	100	101	100
query43	595	556	550	550
query44	1110	736	739	736
query45	204	162	160	160
query46	1068	715	699	699
query47	1874	1787	1798	1787
query48	374	293	296	293
query49	838	401	408	401
query50	776	397	400	397
query51	6852	6826	6698	6698
query52	112	98	113	98
query53	353	291	293	291
query54	939	443	443	443
query55	76	74	76	74
query56	279	262	269	262
query57	1135	1050	1036	1036
query58	243	250	242	242
query59	3486	3438	3295	3295
query60	298	280	272	272
query61	92	93	96	93
query62	788	624	647	624
query63	322	286	281	281
query64	9131	2199	1639	1639
query65	3176	3132	3143	3132
query66	743	326	334	326
query67	15535	14963	15031	14963
query68	4694	537	526	526
query69	701	435	361	361
query70	1205	1139	1138	1138
query71	393	281	280	280
query72	7999	5497	5196	5196
query73	753	331	323	323
query74	6056	5451	5503	5451
query75	3463	2734	2667	2667
query76	2752	927	930	927
query77	613	303	300	300
query78	9868	9021	8891	8891
query79	3218	505	507	505
query80	2171	469	484	469
query81	658	226	215	215
query82	1470	140	138	138
query83	291	171	165	165
query84	275	97	85	85
query85	1350	314	306	306
query86	473	327	303	303
query87	3275	3091	3124	3091
query88	4251	2469	2443	2443
query89	479	388	372	372
query90	1746	191	190	190
query91	131	101	103	101
query92	63	50	50	50
query93	3611	510	508	508
query94	1159	210	211	210
query95	407	326	315	315
query96	621	281	280	280
query97	3209	3023	3016	3016
query98	228	208	198	198
query99	1569	1281	1237	1237
Total cold run time: 284452 ms
Total hot run time: 173135 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7463b97aa1c6e0dc44c96ba1c9dc735c73e70171, data reload: false

query1	0.04	0.04	0.03
query2	0.09	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.49	0.49	0.48
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.04	0.04	0.04
query9	0.56	0.52	0.48
query10	0.55	0.54	0.54
query11	0.16	0.12	0.11
query12	0.15	0.12	0.12
query13	0.60	0.58	0.59
query14	0.76	0.78	0.78
query15	0.85	0.81	0.80
query16	0.37	0.37	0.37
query17	1.06	1.02	0.96
query18	0.24	0.22	0.22
query19	1.81	1.66	1.75
query20	0.01	0.01	0.01
query21	15.39	0.78	0.65
query22	4.87	6.21	2.16
query23	18.29	1.34	1.24
query24	2.15	0.23	0.24
query25	0.16	0.08	0.09
query26	0.29	0.21	0.20
query27	0.44	0.23	0.23
query28	13.16	1.02	0.99
query29	12.59	3.30	3.30
query30	0.26	0.06	0.06
query31	2.86	0.39	0.38
query32	3.27	0.48	0.46
query33	2.90	2.89	2.88
query34	17.06	4.38	4.39
query35	4.42	4.39	4.45
query36	0.67	0.49	0.49
query37	0.19	0.16	0.16
query38	0.14	0.14	0.15
query39	0.04	0.04	0.04
query40	0.15	0.13	0.12
query41	0.09	0.04	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.34 s
Total hot run time: 30.79 s

@freemandealer
Copy link
Contributor Author

run p0

gavinchou
gavinchou previously approved these changes Jul 10, 2024

// metrics
size_t _num_read_blocks = 0;
size_t _num_hit_blocks = 0;
size_t _num_removed_blocks = 0;
std::shared_ptr<bvar::Status<size_t>> _cur_cache_size_metrics;
std::shared_ptr<bvar::Status<size_t>> _cur_ttl_cache_size_metrics;
std::shared_ptr<bvar::Status<size_t>> _cur_ttl_cache_lru_queue_size_metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it "element count"? if no, add an "element count" bvar, if yes, keep the name consistency

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 10, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

to_evict.push_back(cell);

removed_size += cell_size;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated codes in try_reserve_for_ttl_without_lru.

if (_time_to_key_iter.first->second == hash) {
_time_to_key_iter.first =
_time_to_key.erase(_time_to_key_iter.first);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many idents here, we should supress indents via function call.

Originally, TTL file cache only evict when expire. If TTL data full fill
the cache, new TTL data won't be fit in cache and thus switch to SKIP_CACHE
mode, which is a performance killer by overrun the S3 downloader with
tons of small IOs.

This commit enable evicting TTL cache actively through LRU beside the
original TTL expiration.

Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
Signed-off-by: freemandealer <[email protected]>
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Jul 11, 2024
Signed-off-by: freemandealer <[email protected]>
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 11, 2024
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@freemandealer
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39936 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2eb9c861d5f24c793278b2e7b69cc5a8b1861a5e, data reload: false

------ Round 1 ----------------------------------
q1	17626	4975	4278	4278
q2	2004	192	190	190
q3	10465	1239	1150	1150
q4	10183	841	724	724
q5	7543	2707	2673	2673
q6	219	137	140	137
q7	966	606	616	606
q8	9216	2077	2104	2077
q9	8831	6564	6552	6552
q10	8799	3764	3780	3764
q11	457	253	237	237
q12	411	237	234	234
q13	17836	2989	3007	2989
q14	272	244	241	241
q15	517	476	495	476
q16	499	390	389	389
q17	978	660	651	651
q18	8391	7541	7342	7342
q19	7465	1532	1385	1385
q20	691	340	331	331
q21	4937	3183	3169	3169
q22	392	341	343	341
Total cold run time: 118698 ms
Total hot run time: 39936 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4443	4196	4223	4196
q2	368	277	266	266
q3	3049	2993	2983	2983
q4	1999	1742	1735	1735
q5	5525	5523	5432	5432
q6	225	140	146	140
q7	2310	1853	1876	1853
q8	3261	3436	3391	3391
q9	8796	8883	8799	8799
q10	4135	3682	3870	3682
q11	586	531	487	487
q12	836	633	654	633
q13	17095	3191	3174	3174
q14	321	283	295	283
q15	520	481	497	481
q16	486	455	416	416
q17	1818	1534	1524	1524
q18	8198	8032	7873	7873
q19	1813	1703	1597	1597
q20	2152	1865	1865	1865
q21	5125	4701	4902	4701
q22	639	560	610	560
Total cold run time: 73700 ms
Total hot run time: 56071 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 175373 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2eb9c861d5f24c793278b2e7b69cc5a8b1861a5e, data reload: false

query1	924	376	368	368
query2	6395	2492	2383	2383
query3	6634	214	218	214
query4	28066	17456	17322	17322
query5	3638	485	489	485
query6	252	166	164	164
query7	4572	300	286	286
query8	306	306	311	306
query9	8570	2480	2462	2462
query10	449	289	277	277
query11	12562	10114	10089	10089
query12	128	89	83	83
query13	1657	378	378	378
query14	10215	7918	7057	7057
query15	239	189	188	188
query16	7742	328	332	328
query17	1788	576	547	547
query18	1916	285	282	282
query19	205	152	155	152
query20	88	85	82	82
query21	210	131	130	130
query22	4195	4166	4031	4031
query23	34173	33714	33863	33714
query24	11071	2937	2993	2937
query25	638	433	425	425
query26	1202	162	160	160
query27	3007	292	291	291
query28	7508	2175	2155	2155
query29	918	668	658	658
query30	254	162	165	162
query31	1004	791	781	781
query32	104	62	65	62
query33	781	323	339	323
query34	986	508	529	508
query35	729	602	591	591
query36	1134	1001	989	989
query37	159	86	93	86
query38	3031	2833	2830	2830
query39	898	808	823	808
query40	209	124	133	124
query41	55	53	54	53
query42	131	198	100	100
query43	616	559	564	559
query44	1197	721	733	721
query45	197	167	166	166
query46	1088	761	719	719
query47	1861	1791	1761	1761
query48	379	299	309	299
query49	863	413	434	413
query50	787	400	401	400
query51	6826	6701	6727	6701
query52	110	99	98	98
query53	363	293	294	293
query54	896	460	462	460
query55	79	75	76	75
query56	295	276	278	276
query57	1144	1075	1062	1062
query58	253	256	246	246
query59	3588	3492	3329	3329
query60	316	279	284	279
query61	95	91	98	91
query62	800	653	645	645
query63	323	296	295	295
query64	9416	2219	1652	1652
query65	3173	3112	3115	3112
query66	805	355	325	325
query67	15620	15085	15028	15028
query68	6253	540	550	540
query69	808	427	363	363
query70	1226	1095	1172	1095
query71	537	290	283	283
query72	8686	5734	5995	5734
query73	808	331	333	331
query74	5903	5540	5460	5460
query75	4828	2732	2692	2692
query76	4575	985	1005	985
query77	753	318	321	318
query78	9599	9003	9504	9003
query79	3526	525	533	525
query80	2124	481	482	481
query81	594	216	221	216
query82	862	129	126	126
query83	295	172	168	168
query84	274	93	97	93
query85	1355	319	299	299
query86	430	335	301	301
query87	3317	3064	3075	3064
query88	4422	2457	2475	2457
query89	496	380	389	380
query90	2046	195	197	195
query91	134	103	102	102
query92	64	48	50	48
query93	4062	509	509	509
query94	1323	215	210	210
query95	411	314	321	314
query96	593	278	276	276
query97	3196	3004	2976	2976
query98	224	203	193	193
query99	1546	1273	1260	1260
Total cold run time: 295103 ms
Total hot run time: 175373 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2eb9c861d5f24c793278b2e7b69cc5a8b1861a5e, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.49	0.49	0.48
query6	1.13	0.73	0.72
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.55	0.49	0.46
query10	0.53	0.54	0.53
query11	0.15	0.12	0.11
query12	0.15	0.12	0.12
query13	0.58	0.58	0.58
query14	0.76	0.77	0.80
query15	0.85	0.81	0.82
query16	0.36	0.37	0.37
query17	0.97	0.94	1.03
query18	0.23	0.21	0.22
query19	1.81	1.70	1.77
query20	0.02	0.01	0.01
query21	15.40	0.77	0.66
query22	3.85	6.89	2.25
query23	18.36	1.27	1.18
query24	2.14	0.23	0.23
query25	0.15	0.09	0.08
query26	0.30	0.22	0.21
query27	0.46	0.24	0.23
query28	13.24	1.02	0.99
query29	12.61	3.27	3.30
query30	0.25	0.06	0.05
query31	2.87	0.39	0.39
query32	3.28	0.48	0.48
query33	2.90	2.90	2.94
query34	17.18	4.31	4.35
query35	4.45	4.39	4.41
query36	0.65	0.47	0.47
query37	0.19	0.15	0.15
query38	0.16	0.16	0.15
query39	0.05	0.03	0.03
query40	0.16	0.12	0.13
query41	0.10	0.04	0.05
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.52 s
Total hot run time: 30.72 s

@dataroaring dataroaring merged commit 2750e92 into apache:master Jul 12, 2024
24 of 27 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
…#37312)

# Motivation and Basic Ideas
Originally, the TTL file cache only evicts when it expires. If TTL data
fills the cache, new TTL data won't fit in the cache and thus switch to
SKIP_CACHE mode, which is a performance killer that overruns the S3
downloader with tons of small IOs.

This commit enables evicting the TTL cache actively through LRU beside
the original TTL expiration.

# Performance tests
We can set up a scenario where the TTL cache is full by using a small
file cache space (e.g. 5GB). We use table comsumer_ttl in the regression
test attached with the PR and load data several times (e.g. 20GB) to
populate the TTL cache.

With this setting, we execute query `select count(*) from customer_ttl
where C_ADDRESS like '%ea%' and C_NAME like '%a%' and C_COMMENT like
'%b%'` separately while enable_ttl_cache_evict_using_lru = true/false in
the be conf. The results shows that the performance is increased by 43x
during the tests:


| ttl_cache_evict_using_lru feature | enabled | disabled (baseline) |
| --------------------------------- | ------- | ------------------- |
| query execution time              | 8.5s    | 342s                |

Note that the result heavily depends on the ratio of cache size and the
amount of data that the query incurs, along with S3 server performance,
etc, so the result is only here for reference.

Such improvement is achieved by the active eviction of the TTL cache ,
which effectively reduces the possibility of SKIP_CACHE. During the
tests:

| ttl_cache_evict_using_lru feature | enabled | disabled (baseline) |
| --------------------------------- | ------- | ------------------- |
| SKIP_CACHE occurence              | 0   | 257.1K               |

---------

Signed-off-by: freemandealer <[email protected]>
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
# Motivation and Basic Ideas
Originally, the TTL file cache only evicts when it expires. If TTL data
fills the cache, new TTL data won't fit in the cache and thus switch to
SKIP_CACHE mode, which is a performance killer that overruns the S3
downloader with tons of small IOs.

This commit enables evicting the TTL cache actively through LRU beside
the original TTL expiration.

# Performance tests
We can set up a scenario where the TTL cache is full by using a small
file cache space (e.g. 5GB). We use table comsumer_ttl in the regression
test attached with the PR and load data several times (e.g. 20GB) to
populate the TTL cache.

With this setting, we execute query `select count(*) from customer_ttl
where C_ADDRESS like '%ea%' and C_NAME like '%a%' and C_COMMENT like
'%b%'` separately while enable_ttl_cache_evict_using_lru = true/false in
the be conf. The results shows that the performance is increased by 43x
during the tests:


| ttl_cache_evict_using_lru feature | enabled | disabled (baseline) |
| --------------------------------- | ------- | ------------------- |
| query execution time              | 8.5s    | 342s                |

Note that the result heavily depends on the ratio of cache size and the
amount of data that the query incurs, along with S3 server performance,
etc, so the result is only here for reference.

Such improvement is achieved by the active eviction of the TTL cache ,
which effectively reduces the possibility of SKIP_CACHE. During the
tests:

| ttl_cache_evict_using_lru feature | enabled | disabled (baseline) |
| --------------------------------- | ------- | ------------------- |
| SKIP_CACHE occurence              | 0   | 257.1K               |

---------

Signed-off-by: freemandealer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants