Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](heartbeat) fix heartbeat editlog no persist hbTime #42653

Merged
merged 3 commits into from
Oct 31, 2024

Conversation

yujun777
Copy link
Collaborator

@yujun777 yujun777 commented Oct 28, 2024

Backend persist lastUpdateMs, it will be modified by heartbeat editlog. But heartbeat editlog not persist hbTime, and hbTime always equal 0, it will make backend's lastUpdateMs = 0 in bdb image.

fix details:

  1. heartbeat response persist hbTime;
  2. only be state change will write an editlog. but we make a change: even a backend is healthy, still write a healthy response editlog every 5 min. Inorder to make backend's lastUpdateMs periodly updated in bdb image. But notice that this change wouldn't increase real editlog num. Because heartbeat mgr will patch all fe/be's heartbeat into one editlog. Even no fe/be state change, it still write an editlog which not contains any node's response.
  3. for a dead heartbeat response, set hbTime to last succ hbTime, then replayer can set correct lastUpdateMs;

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@yujun777
Copy link
Collaborator Author

run buildall

@yujun777 yujun777 force-pushed the fix-replay-lost-heartbeat-time branch from f4e7ed1 to 3090bd4 Compare October 28, 2024 14:05
@yujun777
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41945 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3090bd413fd28c567ecf3751466d2dd4443bc8db, data reload: false

------ Round 1 ----------------------------------
q1	17673	7672	7411	7411
q2	2048	161	155	155
q3	10625	1115	1211	1115
q4	10553	895	866	866
q5	7794	3213	3158	3158
q6	232	147	146	146
q7	1029	621	620	620
q8	9346	2026	2057	2026
q9	6655	6511	6557	6511
q10	7025	2420	2497	2420
q11	454	250	248	248
q12	412	218	220	218
q13	17765	3030	3038	3030
q14	239	214	209	209
q15	578	531	549	531
q16	635	590	594	590
q17	991	578	580	578
q18	7466	6702	6831	6702
q19	1341	1011	1047	1011
q20	465	177	186	177
q21	4100	3293	3202	3202
q22	1091	1021	1021	1021
Total cold run time: 108517 ms
Total hot run time: 41945 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7356	7271	7347	7271
q2	342	223	225	223
q3	3125	3013	3053	3013
q4	2145	1871	1830	1830
q5	5775	5828	5831	5828
q6	230	142	143	142
q7	2355	1817	1782	1782
q8	3509	3610	3512	3512
q9	9051	9027	8990	8990
q10	3659	3605	3628	3605
q11	602	485	498	485
q12	846	645	584	584
q13	9436	3191	3218	3191
q14	309	274	265	265
q15	611	546	544	544
q16	702	655	646	646
q17	1928	1649	1638	1638
q18	8548	7911	7636	7636
q19	1730	1552	1633	1552
q20	2171	1879	1843	1843
q21	5787	5446	5510	5446
q22	1184	1029	1011	1011
Total cold run time: 71401 ms
Total hot run time: 61037 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196161 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3090bd413fd28c567ecf3751466d2dd4443bc8db, data reload: false

query1	1271	992	1020	992
query2	6216	2091	1995	1995
query3	11453	4669	4713	4669
query4	33903	23686	23554	23554
query5	5180	447	452	447
query6	280	183	181	181
query7	3996	295	294	294
query8	294	228	224	224
query9	9441	2739	2700	2700
query10	502	265	242	242
query11	18264	15340	15160	15160
query12	163	100	99	99
query13	1576	431	411	411
query14	9810	6434	7308	6434
query15	256	178	186	178
query16	8002	505	486	486
query17	1518	612	574	574
query18	2171	310	305	305
query19	355	154	155	154
query20	124	114	113	113
query21	201	108	113	108
query22	4867	4648	4482	4482
query23	34877	34292	34087	34087
query24	11059	2804	2743	2743
query25	639	405	409	405
query26	1315	158	153	153
query27	2578	276	285	276
query28	7762	2425	2422	2422
query29	857	416	416	416
query30	263	162	169	162
query31	1012	806	822	806
query32	86	54	59	54
query33	764	258	275	258
query34	947	536	511	511
query35	1032	895	886	886
query36	1085	931	957	931
query37	136	78	68	68
query38	4472	4186	4309	4186
query39	1478	1421	1408	1408
query40	246	100	100	100
query41	48	45	45	45
query42	113	100	97	97
query43	522	493	484	484
query44	1229	801	800	800
query45	178	166	165	165
query46	1150	697	688	688
query47	1963	1857	1904	1857
query48	439	318	322	318
query49	937	413	398	398
query50	810	391	390	390
query51	7153	6999	7001	6999
query52	98	86	84	84
query53	254	178	174	174
query54	1061	382	399	382
query55	76	74	71	71
query56	257	238	224	224
query57	1293	1147	1162	1147
query58	227	201	200	200
query59	3335	3167	3222	3167
query60	277	242	237	237
query61	99	101	98	98
query62	829	668	685	668
query63	216	190	177	177
query64	4065	610	612	610
query65	3274	3183	3176	3176
query66	788	297	295	295
query67	15795	15668	15928	15668
query68	4645	563	564	563
query69	437	251	249	249
query70	1179	1150	1108	1108
query71	325	256	258	256
query72	6236	3981	3955	3955
query73	754	354	365	354
query74	10369	8855	9008	8855
query75	3414	2632	2661	2632
query76	2576	969	1023	969
query77	373	259	266	259
query78	10726	9684	9662	9662
query79	1141	618	616	616
query80	799	438	437	437
query81	556	245	241	241
query82	1349	108	119	108
query83	214	142	137	137
query84	236	69	70	69
query85	1061	293	275	275
query86	331	291	294	291
query87	4855	4644	4627	4627
query88	3234	2217	2153	2153
query89	417	287	291	287
query90	1778	180	181	180
query91	135	99	98	98
query92	59	46	49	46
query93	1090	548	528	528
query94	749	274	270	270
query95	333	243	241	241
query96	604	269	287	269
query97	2880	2647	2768	2647
query98	211	197	186	186
query99	1764	1300	1303	1300
Total cold run time: 301767 ms
Total hot run time: 196161 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.74 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3090bd413fd28c567ecf3751466d2dd4443bc8db, data reload: false

query1	0.04	0.04	0.03
query2	0.06	0.03	0.03
query3	0.23	0.07	0.07
query4	1.64	0.10	0.10
query5	0.40	0.38	0.38
query6	1.16	0.66	0.64
query7	0.02	0.01	0.01
query8	0.06	0.04	0.03
query9	0.56	0.50	0.50
query10	0.56	0.54	0.54
query11	0.13	0.10	0.11
query12	0.14	0.11	0.11
query13	0.61	0.60	0.59
query14	2.84	2.87	2.74
query15	0.88	0.82	0.83
query16	0.38	0.38	0.38
query17	1.06	1.05	1.06
query18	0.20	0.20	0.20
query19	1.94	1.88	1.95
query20	0.02	0.01	0.01
query21	15.36	0.59	0.58
query22	2.34	2.08	1.53
query23	17.12	0.87	0.92
query24	3.40	1.32	1.53
query25	0.32	0.20	0.18
query26	0.44	0.14	0.13
query27	0.04	0.04	0.05
query28	9.70	1.11	1.07
query29	12.55	3.27	3.30
query30	0.24	0.05	0.06
query31	2.88	0.38	0.36
query32	3.30	0.45	0.46
query33	3.00	3.03	3.01
query34	17.08	4.45	4.41
query35	4.50	4.47	4.48
query36	0.68	0.50	0.51
query37	0.08	0.06	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.03
query40	0.16	0.12	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.02	0.02
Total cold run time: 106.33 s
Total hot run time: 32.74 s

@yujun777
Copy link
Collaborator Author

run feut

@yujun777
Copy link
Collaborator Author

run p0

@yujun777
Copy link
Collaborator Author

run feut

@yujun777
Copy link
Collaborator Author

run p0

@yujun777 yujun777 marked this pull request as draft October 29, 2024 06:52
@yujun777 yujun777 force-pushed the fix-replay-lost-heartbeat-time branch from f267923 to 685260e Compare October 29, 2024 09:07
@yujun777 yujun777 changed the title [fix](replay) fix replay heartbeat lost update hb time [fix](cloud rebalance) Add check for transfer primary be Oct 29, 2024
@yujun777 yujun777 marked this pull request as ready for review October 29, 2024 10:18
@yujun777
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40846 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 685260e2f83559ea353dc17ab5fa79d1d2842e2e, data reload: false

------ Round 1 ----------------------------------
q1	17583	8099	7276	7276
q2	2042	153	150	150
q3	10592	1142	1128	1128
q4	10226	787	809	787
q5	7746	3110	3014	3014
q6	238	145	142	142
q7	993	590	591	590
q8	9354	1928	1985	1928
q9	6539	6366	6416	6366
q10	7052	2416	2426	2416
q11	437	240	248	240
q12	401	216	213	213
q13	17783	3010	3036	3010
q14	237	206	225	206
q15	578	517	519	517
q16	651	591	589	589
q17	960	568	517	517
q18	7356	6562	6770	6562
q19	1330	1039	1034	1034
q20	479	187	185	185
q21	4422	3253	2994	2994
q22	1117	982	997	982
Total cold run time: 108116 ms
Total hot run time: 40846 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7260	7227	7198	7198
q2	333	229	237	229
q3	2934	2798	2780	2780
q4	1935	1679	1683	1679
q5	5473	5575	5485	5485
q6	218	136	136	136
q7	2120	1707	1733	1707
q8	3223	3374	3427	3374
q9	8602	8543	8550	8543
q10	3492	3439	3427	3427
q11	571	476	486	476
q12	786	557	569	557
q13	10311	2988	2996	2988
q14	305	267	254	254
q15	549	510	521	510
q16	686	623	636	623
q17	1816	1584	1535	1535
q18	7856	7556	7477	7477
q19	1663	1505	1523	1505
q20	2056	1812	1813	1812
q21	5234	5285	5130	5130
q22	1069	1020	1014	1014
Total cold run time: 68492 ms
Total hot run time: 58439 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191146 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 685260e2f83559ea353dc17ab5fa79d1d2842e2e, data reload: false

query1	977	367	359	359
query2	6516	2102	2043	2043
query3	6784	208	212	208
query4	39718	23935	23549	23549
query5	5579	446	434	434
query6	248	166	162	162
query7	4050	300	294	294
query8	292	224	214	214
query9	6502	2717	2700	2700
query10	452	248	256	248
query11	15931	15268	15264	15264
query12	158	106	103	103
query13	1010	444	408	408
query14	9557	7303	6602	6602
query15	222	180	174	174
query16	7465	457	459	457
query17	1517	579	552	552
query18	1331	291	289	289
query19	368	150	149	149
query20	116	109	111	109
query21	210	103	100	100
query22	4558	4227	4605	4227
query23	34754	33951	34105	33951
query24	11086	2799	2831	2799
query25	619	401	408	401
query26	1176	160	160	160
query27	2461	276	288	276
query28	7657	2463	2447	2447
query29	720	435	424	424
query30	325	173	161	161
query31	1005	803	847	803
query32	76	56	59	56
query33	625	280	278	278
query34	912	512	508	508
query35	1063	880	885	880
query36	1056	949	948	948
query37	143	76	73	73
query38	4440	4336	4311	4311
query39	1470	1432	1449	1432
query40	206	99	101	99
query41	51	46	48	46
query42	111	101	98	98
query43	535	477	478	477
query44	1222	811	816	811
query45	190	163	172	163
query46	1124	687	714	687
query47	1953	1850	1884	1850
query48	422	332	330	330
query49	1174	411	410	410
query50	825	379	390	379
query51	7224	7042	6977	6977
query52	99	91	90	90
query53	252	184	177	177
query54	575	396	413	396
query55	78	79	79	79
query56	268	243	246	243
query57	1346	1171	1197	1171
query58	237	213	214	213
query59	3102	2955	3180	2955
query60	267	245	246	245
query61	104	106	97	97
query62	846	672	661	661
query63	212	184	180	180
query64	4740	623	605	605
query65	3298	3224	3204	3204
query66	1190	302	325	302
query67	16158	15607	15602	15602
query68	4594	577	551	551
query69	427	250	254	250
query70	1189	1101	1167	1101
query71	323	255	266	255
query72	6367	4046	3996	3996
query73	806	353	363	353
query74	10231	9077	9102	9077
query75	3394	2703	2663	2663
query76	2449	1007	1000	1000
query77	401	275	272	272
query78	10540	9561	9482	9482
query79	1370	583	602	583
query80	992	432	438	432
query81	553	238	245	238
query82	1258	122	110	110
query83	200	140	134	134
query84	238	68	67	67
query85	1232	321	301	301
query86	312	297	305	297
query87	5083	4809	4591	4591
query88	3245	2216	2185	2185
query89	401	291	293	291
query90	1662	182	182	182
query91	139	99	97	97
query92	57	46	49	46
query93	1088	543	537	537
query94	899	287	278	278
query95	339	248	243	243
query96	609	286	276	276
query97	2911	2656	2701	2656
query98	208	199	197	197
query99	1521	1305	1316	1305
Total cold run time: 295996 ms
Total hot run time: 191146 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 685260e2f83559ea353dc17ab5fa79d1d2842e2e, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.41	0.41	0.40
query6	1.22	0.67	0.67
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.51	0.49
query10	0.55	0.54	0.54
query11	0.15	0.10	0.11
query12	0.13	0.11	0.11
query13	0.61	0.60	0.59
query14	2.83	2.84	2.85
query15	0.90	0.83	0.82
query16	0.39	0.37	0.39
query17	1.07	1.04	1.07
query18	0.23	0.21	0.22
query19	1.90	1.88	2.07
query20	0.01	0.01	0.01
query21	15.36	0.59	0.58
query22	2.42	2.08	1.91
query23	16.98	0.86	0.88
query24	3.06	1.53	0.66
query25	0.18	0.07	0.14
query26	0.54	0.14	0.13
query27	0.05	0.04	0.04
query28	10.68	1.09	1.08
query29	12.59	3.25	3.26
query30	0.24	0.06	0.06
query31	2.86	0.38	0.38
query32	3.27	0.46	0.45
query33	2.97	3.06	3.06
query34	16.90	4.44	4.46
query35	4.46	4.48	4.51
query36	0.64	0.49	0.52
query37	0.08	0.06	0.06
query38	0.04	0.03	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.13
query41	0.07	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.65 s
Total hot run time: 32.57 s

@yujun777 yujun777 changed the title [fix](cloud rebalance) Add check for transfer primary be [fix](heartbeat) fix heartbeat editlog no persist hbTime Oct 29, 2024
@yujun777
Copy link
Collaborator Author

run buildall

@yujun777 yujun777 force-pushed the fix-replay-lost-heartbeat-time branch from db6664c to f64d942 Compare October 29, 2024 11:42
@yujun777
Copy link
Collaborator Author

run buildall

@yujun777
Copy link
Collaborator Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Oct 30, 2024
Copy link
Contributor

@deardeng deardeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-H: Total hot run time: 41385 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3621967c7b9fb2bf221e549bb9b212c9bf05fdff, data reload: false

------ Round 1 ----------------------------------
q1	17586	7639	7320	7320
q2	2047	166	158	158
q3	10716	1178	1192	1178
q4	10562	834	763	763
q5	7759	3116	3046	3046
q6	235	145	151	145
q7	1006	618	617	617
q8	9581	2000	1975	1975
q9	7912	6454	6427	6427
q10	7067	2425	2446	2425
q11	442	247	243	243
q12	402	219	210	210
q13	17782	3030	3045	3030
q14	234	206	208	206
q15	574	527	514	514
q16	646	587	602	587
q17	986	599	567	567
q18	7303	6730	6731	6730
q19	1349	1045	988	988
q20	464	178	184	178
q21	4056	3094	3317	3094
q22	1120	1008	984	984
Total cold run time: 109829 ms
Total hot run time: 41385 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7308	7301	7316	7301
q2	325	223	225	223
q3	2907	2787	2862	2787
q4	1923	1746	1660	1660
q5	5484	5506	5517	5506
q6	215	135	138	135
q7	2139	1753	1738	1738
q8	3293	3399	3443	3399
q9	8637	8599	8583	8583
q10	3489	3453	3444	3444
q11	589	490	472	472
q12	786	564	630	564
q13	6815	3007	2987	2987
q14	289	266	259	259
q15	560	523	516	516
q16	678	634	631	631
q17	1839	1612	1574	1574
q18	7933	7715	7605	7605
q19	1672	1527	1567	1527
q20	2100	1796	1807	1796
q21	5564	5290	5326	5290
q22	1129	992	1029	992
Total cold run time: 65674 ms
Total hot run time: 58989 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192290 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3621967c7b9fb2bf221e549bb9b212c9bf05fdff, data reload: false

query1	993	377	357	357
query2	6509	2067	2046	2046
query3	6799	222	221	221
query4	34148	23708	23585	23585
query5	4323	461	428	428
query6	262	189	176	176
query7	4598	297	284	284
query8	281	221	222	221
query9	9519	2715	2730	2715
query10	467	251	252	251
query11	18111	15296	15369	15296
query12	148	100	99	99
query13	1692	424	426	424
query14	10483	6828	7392	6828
query15	290	171	173	171
query16	8138	449	466	449
query17	1718	573	546	546
query18	2139	308	299	299
query19	369	146	146	146
query20	122	108	109	108
query21	210	103	102	102
query22	4563	4470	4389	4389
query23	35286	34090	34227	34090
query24	11161	2841	2800	2800
query25	653	400	404	400
query26	1241	158	157	157
query27	2843	279	284	279
query28	7811	2479	2437	2437
query29	860	430	429	429
query30	334	169	173	169
query31	1012	811	836	811
query32	101	57	60	57
query33	795	270	271	270
query34	1004	514	517	514
query35	1110	899	897	897
query36	1117	947	968	947
query37	142	80	77	77
query38	4398	4291	4212	4212
query39	1473	1429	1423	1423
query40	205	102	102	102
query41	47	48	50	48
query42	106	102	98	98
query43	534	494	489	489
query44	1233	811	807	807
query45	186	163	165	163
query46	1141	706	696	696
query47	1933	1826	1851	1826
query48	432	320	321	320
query49	1174	408	415	408
query50	801	394	415	394
query51	7206	7173	7027	7027
query52	99	91	94	91
query53	257	185	182	182
query54	1346	421	413	413
query55	85	85	81	81
query56	267	262	262	262
query57	1311	1182	1204	1182
query58	252	212	212	212
query59	3154	2948	2999	2948
query60	300	248	244	244
query61	98	105	104	104
query62	869	669	679	669
query63	212	188	188	188
query64	5244	638	591	591
query65	3302	3224	3195	3195
query66	1444	303	304	303
query67	16227	15885	15960	15885
query68	4984	551	542	542
query69	440	275	274	274
query70	1202	1156	1103	1103
query71	338	247	254	247
query72	6426	3970	3993	3970
query73	768	356	358	356
query74	10202	8976	8991	8976
query75	3414	2699	2690	2690
query76	2931	1015	1147	1015
query77	400	270	270	270
query78	10547	9542	9553	9542
query79	1135	602	592	592
query80	769	431	443	431
query81	532	237	236	236
query82	1331	123	118	118
query83	236	142	137	137
query84	235	72	69	69
query85	1087	299	284	284
query86	313	309	293	293
query87	4999	4839	4756	4756
query88	3330	2194	2149	2149
query89	403	293	292	292
query90	2127	188	185	185
query91	134	99	101	99
query92	58	51	49	49
query93	1067	544	541	541
query94	802	301	301	301
query95	350	246	253	246
query96	617	277	282	277
query97	2846	2736	2701	2701
query98	213	200	195	195
query99	1537	1316	1326	1316
Total cold run time: 302461 ms
Total hot run time: 192290 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.82 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3621967c7b9fb2bf221e549bb9b212c9bf05fdff, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.06	0.06
query4	1.65	0.10	0.11
query5	0.42	0.38	0.42
query6	1.15	0.66	0.64
query7	0.02	0.02	0.02
query8	0.03	0.03	0.03
query9	0.55	0.49	0.49
query10	0.53	0.57	0.56
query11	0.15	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	2.73	2.76	2.76
query15	0.89	0.83	0.83
query16	0.38	0.39	0.38
query17	1.04	1.05	1.05
query18	0.23	0.21	0.21
query19	1.93	1.88	2.00
query20	0.02	0.01	0.01
query21	15.36	0.59	0.59
query22	2.60	2.31	2.22
query23	16.97	0.96	0.76
query24	2.71	0.76	1.19
query25	0.18	0.08	0.26
query26	0.39	0.13	0.14
query27	0.06	0.04	0.04
query28	11.05	1.08	1.06
query29	12.58	3.36	3.35
query30	0.25	0.06	0.06
query31	2.87	0.37	0.39
query32	3.26	0.46	0.45
query33	2.98	3.03	3.00
query34	16.85	4.46	4.40
query35	4.52	4.52	4.50
query36	0.64	0.48	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.03	0.02
query40	0.16	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.54 s
Total hot run time: 32.82 s

@yujun777 yujun777 force-pushed the fix-replay-lost-heartbeat-time branch from 3621967 to f9b4bfd Compare October 30, 2024 10:33
@yujun777
Copy link
Collaborator Author

run buildall

@yujun777
Copy link
Collaborator Author

run performance

2 similar comments
@yujun777
Copy link
Collaborator Author

run performance

@yujun777
Copy link
Collaborator Author

run performance

@yujun777
Copy link
Collaborator Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 30, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@yujun777
Copy link
Collaborator Author

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 41543 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f9b4bfd4795dc4dc55678851e8f05df691f65e6b, data reload: false

------ Round 1 ----------------------------------
q1	17768	7599	7381	7381
q2	2049	166	193	166
q3	10783	1068	1210	1068
q4	10385	874	871	871
q5	7742	3117	3066	3066
q6	231	150	146	146
q7	1018	621	616	616
q8	9354	1969	2039	1969
q9	6631	6426	6490	6426
q10	7088	2387	2410	2387
q11	464	255	249	249
q12	407	214	211	211
q13	17799	2999	2996	2996
q14	242	219	214	214
q15	565	509	510	509
q16	685	600	583	583
q17	979	642	595	595
q18	7448	6726	6734	6726
q19	1337	1049	1081	1049
q20	473	182	190	182
q21	4013	3263	3138	3138
q22	1082	997	995	995
Total cold run time: 108543 ms
Total hot run time: 41543 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7315	7313	7301	7301
q2	317	231	224	224
q3	3009	2928	2965	2928
q4	2099	1827	1802	1802
q5	5743	5801	5776	5776
q6	232	143	140	140
q7	2208	1820	1782	1782
q8	3379	3572	3424	3424
q9	8905	8989	8890	8890
q10	3597	3580	3550	3550
q11	608	495	480	480
q12	839	677	624	624
q13	9568	3209	3252	3209
q14	311	278	287	278
q15	574	520	508	508
q16	694	628	659	628
q17	1846	1640	1633	1633
q18	8232	7829	7634	7634
q19	1742	1550	1598	1550
q20	2136	1868	1845	1845
q21	5581	5504	5428	5428
q22	1170	1065	1037	1037
Total cold run time: 70105 ms
Total hot run time: 60671 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196318 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f9b4bfd4795dc4dc55678851e8f05df691f65e6b, data reload: false

query1	1267	993	984	984
query2	6221	2035	2013	2013
query3	11437	4762	4765	4762
query4	33578	23537	23694	23537
query5	4793	464	452	452
query6	275	180	174	174
query7	4000	298	298	298
query8	290	232	225	225
query9	9544	2644	2638	2638
query10	492	249	241	241
query11	18439	15346	15539	15346
query12	155	100	98	98
query13	1571	409	415	409
query14	9226	6946	7286	6946
query15	263	192	179	179
query16	8066	462	504	462
query17	1433	601	592	592
query18	2149	299	301	299
query19	287	166	162	162
query20	118	113	120	113
query21	206	108	113	108
query22	4878	4489	4110	4110
query23	34877	34131	34230	34131
query24	11050	2826	2745	2745
query25	622	398	402	398
query26	1208	156	158	156
query27	2342	281	298	281
query28	7620	2415	2405	2405
query29	842	417	419	417
query30	252	171	165	165
query31	1017	786	809	786
query32	87	59	58	58
query33	761	265	280	265
query34	927	505	516	505
query35	1028	890	885	885
query36	1109	956	948	948
query37	138	70	70	70
query38	4427	4203	4292	4203
query39	1472	1430	1449	1430
query40	201	101	101	101
query41	48	46	46	46
query42	119	97	99	97
query43	531	489	487	487
query44	1247	814	808	808
query45	183	167	164	164
query46	1124	704	706	704
query47	1922	1848	1879	1848
query48	427	316	322	316
query49	931	399	383	383
query50	811	390	402	390
query51	7112	7088	6946	6946
query52	104	89	88	88
query53	256	182	182	182
query54	1153	404	398	398
query55	73	77	77	77
query56	245	236	231	231
query57	1304	1189	1147	1147
query58	217	197	197	197
query59	3127	2979	2767	2767
query60	260	244	242	242
query61	99	102	96	96
query62	849	684	682	682
query63	214	184	186	184
query64	3934	629	602	602
query65	3272	3195	3193	3193
query66	798	333	296	296
query67	16028	15698	15797	15698
query68	4398	543	530	530
query69	425	263	249	249
query70	1150	1154	1143	1143
query71	321	254	237	237
query72	6264	3967	4029	3967
query73	754	351	355	351
query74	10179	9008	8990	8990
query75	3405	2665	2673	2665
query76	2638	1022	1049	1022
query77	388	275	299	275
query78	10469	9620	9535	9535
query79	1306	594	576	576
query80	863	424	415	415
query81	563	241	243	241
query82	388	117	117	117
query83	224	133	135	133
query84	245	73	68	68
query85	1282	304	271	271
query86	399	299	294	294
query87	4856	4720	4609	4609
query88	3923	2179	2150	2150
query89	410	292	286	286
query90	2135	186	180	180
query91	132	100	96	96
query92	64	48	52	48
query93	2244	540	536	536
query94	989	269	291	269
query95	344	253	248	248
query96	617	273	287	273
query97	2893	2721	2698	2698
query98	221	197	198	197
query99	1584	1283	1303	1283
Total cold run time: 301129 ms
Total hot run time: 196318 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.78 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f9b4bfd4795dc4dc55678851e8f05df691f65e6b, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.04
query3	0.23	0.07	0.07
query4	1.65	0.11	0.11
query5	0.40	0.41	0.39
query6	1.13	0.65	0.64
query7	0.02	0.01	0.01
query8	0.04	0.03	0.03
query9	0.56	0.49	0.49
query10	0.55	0.54	0.54
query11	0.13	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.61	0.60
query14	2.74	2.75	2.86
query15	0.90	0.82	0.82
query16	0.39	0.35	0.36
query17	0.98	1.08	1.03
query18	0.19	0.20	0.20
query19	1.96	1.81	1.96
query20	0.01	0.01	0.01
query21	15.36	0.58	0.58
query22	2.58	2.18	1.89
query23	17.07	0.85	0.71
query24	2.93	1.32	1.76
query25	0.17	0.28	0.13
query26	0.48	0.14	0.15
query27	0.05	0.05	0.04
query28	10.28	1.09	1.07
query29	12.58	3.24	3.23
query30	0.24	0.06	0.07
query31	2.86	0.38	0.37
query32	3.27	0.45	0.45
query33	2.97	3.03	3.02
query34	16.89	4.44	4.50
query35	4.54	4.45	4.47
query36	0.66	0.48	0.47
query37	0.08	0.05	0.05
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 106.13 s
Total hot run time: 32.78 s

@dataroaring dataroaring merged commit ac6a868 into apache:master Oct 31, 2024
24 of 25 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 31, 2024
Backend persist lastUpdateMs, it will be modified by heartbeat editlog.
But heartbeat editlog not persist hbTime, and hbTime always equal 0, it
will make backend's lastUpdateMs = 0 in bdb image.

fix details:
1. heartbeat response persist hbTime;
2. only be state change will write an editlog. but we make a change:
even a backend is healthy, still write a healthy response editlog every
5 min. Inorder to make backend's lastUpdateMs periodly updated in bdb
image. But notice that this change wouldn't increase real editlog num.
Because heartbeat mgr will patch all fe/be's heartbeat into one editlog.
Even no fe/be state change, it still write an editlog which not contains
any node's response.
3. for a dead heartbeat response, set hbTime to last succ hbTime, then
replayer can set correct lastUpdateMs;
dataroaring pushed a commit that referenced this pull request Nov 7, 2024
…42986)

PR Body: Backend persist lastUpdateMs, it will be modified by heartbeat
editlog. But heartbeat editlog not persist hbTime, and hbTime always
equal 0, it will make backend's lastUpdateMs = 0 in bdb image.

fix details:
1. heartbeat response persist hbTime;
2. only be state change will write an editlog. but we make a change:
even a backend is healthy, still write a healthy response editlog every
5 min. Inorder to make backend's lastUpdateMs periodly updated in bdb
image. But notice that this change wouldn't increase real editlog num.
Because heartbeat mgr will patch all fe/be's heartbeat into one editlog.
Even no fe/be state change, it still write an editlog which not contains
any node's response.
3. for a dead heartbeat response, set hbTime to last succ hbTime, then
replayer can set correct lastUpdateMs;
 Cherry-picked from #42653

Co-authored-by: yujun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.3-merged p0_b reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants