Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(cloud-merge) Support shadow tablet to do cumulative compaction in cloud mode #37293

Merged
merged 15 commits into from
Aug 2, 2024

Conversation

Lchangliang
Copy link
Contributor

@Lchangliang Lchangliang commented Jul 4, 2024

In cloud mode, when do schema change, shadow tablet encounters -235 because it cant do cumulative compaction in the case of a large number of loads. And it will prevents the user from continuing to loads.
Implementation details:

  1. When start schema change, record the end convert rowset version alter_version into SchemaChangeJob.
  2. For origin tablet, only can do base compaction in [0, alter_version] and do cumulative compaction in (alter_version, N]. can not do compaction across alter_verison such as compaction [a, alter_version + n].
  3. For shadow tablet, cannot do base compaction and and do cumulative compaction in (alter_version, N].
  4. When the schema change failed because FE or BE coredump, it will retry. When retry the schema change, it will get the alter_version from meta_serive, and continue to do it.
  5. When finish the schema change job or cancel it, we need to clear the schema change job. Before this pr, it will cover by next schema change.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Lchangliang
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Jul 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params) {
Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_convert_historical_rowsets' exceeds recommended size/complexity thresholds [readability-function-size]

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^
Additional context

be/src/cloud/cloud_schema_change_job.cpp:216: 172 lines including whitespace and comments (threshold 80)

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^

@dataroaring
Copy link
Contributor

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params) {
Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_convert_historical_rowsets' exceeds recommended size/complexity thresholds [readability-function-size]

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^
Additional context

be/src/cloud/cloud_schema_change_job.cpp:214: 172 lines including whitespace and comments (threshold 80)

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^

@Lchangliang
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40097 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1a8892fb754339dfdc7ad382a9e3fc23ca69423e, data reload: false

------ Round 1 ----------------------------------
q1	18553	4697	4274	4274
q2	2030	198	190	190
q3	10548	1179	1200	1179
q4	10223	802	874	802
q5	7538	2651	2635	2635
q6	224	135	135	135
q7	969	603	619	603
q8	9226	2052	2107	2052
q9	8810	6473	6473	6473
q10	9009	3727	3726	3726
q11	455	239	240	239
q12	437	234	238	234
q13	17888	3011	3030	3011
q14	282	224	222	222
q15	534	481	494	481
q16	533	393	373	373
q17	988	701	699	699
q18	8089	7541	7522	7522
q19	5566	1515	1466	1466
q20	670	337	335	335
q21	4904	3107	3929	3107
q22	412	339	348	339
Total cold run time: 117888 ms
Total hot run time: 40097 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4402	4283	4243	4243
q2	387	286	269	269
q3	3021	2899	2945	2899
q4	1985	1664	1688	1664
q5	5530	5544	5397	5397
q6	230	145	131	131
q7	2235	1860	1854	1854
q8	3314	3426	3415	3415
q9	8716	8747	8760	8747
q10	4193	3632	3827	3632
q11	612	504	496	496
q12	803	631	646	631
q13	16433	3212	3188	3188
q14	317	280	285	280
q15	541	474	483	474
q16	507	425	433	425
q17	1796	1531	1535	1531
q18	8124	7810	7865	7810
q19	1794	1615	1582	1582
q20	2228	1885	1887	1885
q21	8614	4770	4925	4770
q22	605	552	556	552
Total cold run time: 76387 ms
Total hot run time: 55875 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173080 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1a8892fb754339dfdc7ad382a9e3fc23ca69423e, data reload: false

query1	930	392	376	376
query2	6462	2525	2493	2493
query3	6653	212	217	212
query4	20385	17490	17430	17430
query5	3624	477	476	476
query6	263	162	168	162
query7	4589	306	292	292
query8	337	303	294	294
query9	8583	2413	2395	2395
query10	558	308	284	284
query11	10458	10031	10061	10031
query12	121	84	85	84
query13	1654	370	378	370
query14	10145	6917	7616	6917
query15	241	190	186	186
query16	7743	311	313	311
query17	1792	565	537	537
query18	1915	282	277	277
query19	215	152	150	150
query20	93	82	85	82
query21	210	130	128	128
query22	4222	3937	4002	3937
query23	34038	33625	33395	33395
query24	10100	2900	2878	2878
query25	639	400	407	400
query26	707	165	161	161
query27	2307	323	325	323
query28	5992	2185	2162	2162
query29	905	667	675	667
query30	251	166	154	154
query31	992	773	754	754
query32	99	54	55	54
query33	680	310	314	310
query34	894	488	477	477
query35	729	644	650	644
query36	1072	995	945	945
query37	142	81	82	81
query38	2969	2903	2987	2903
query39	907	863	852	852
query40	213	135	132	132
query41	57	58	55	55
query42	120	96	101	96
query43	623	562	558	558
query44	1078	740	741	740
query45	198	166	165	165
query46	1088	735	716	716
query47	1878	1808	1794	1794
query48	374	300	305	300
query49	943	403	423	403
query50	763	380	382	380
query51	6938	6835	6729	6729
query52	105	91	91	91
query53	358	289	287	287
query54	882	440	459	440
query55	76	71	73	71
query56	279	262	268	262
query57	1136	1089	1055	1055
query58	246	244	250	244
query59	3595	3241	3297	3241
query60	316	279	279	279
query61	99	92	96	92
query62	599	447	428	428
query63	319	283	292	283
query64	8548	2252	1749	1749
query65	3223	3100	3116	3100
query66	761	360	364	360
query67	15825	14848	14829	14829
query68	8239	532	546	532
query69	713	430	348	348
query70	1223	1151	1120	1120
query71	503	285	282	282
query72	8675	5380	5268	5268
query73	1159	321	324	321
query74	5899	5518	5522	5518
query75	5014	2662	2665	2662
query76	4771	921	855	855
query77	789	292	291	291
query78	9722	8731	8822	8731
query79	9585	521	522	521
query80	1150	472	465	465
query81	556	218	220	218
query82	775	109	105	105
query83	337	169	167	167
query84	271	82	85	82
query85	1309	336	273	273
query86	406	308	318	308
query87	3349	3128	3151	3128
query88	4529	2375	2395	2375
query89	534	381	387	381
query90	1962	182	184	182
query91	131	104	99	99
query92	64	48	48	48
query93	6964	507	510	507
query94	1267	208	203	203
query95	414	322	315	315
query96	597	266	269	266
query97	3215	3027	3006	3006
query98	209	200	202	200
query99	1173	843	840	840
Total cold run time: 290760 ms
Total hot run time: 173080 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1a8892fb754339dfdc7ad382a9e3fc23ca69423e, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.03	0.04
query3	0.23	0.05	0.06
query4	1.65	0.09	0.09
query5	0.50	0.49	0.47
query6	1.12	0.72	0.71
query7	0.02	0.01	0.01
query8	0.06	0.04	0.05
query9	0.56	0.49	0.50
query10	0.54	0.55	0.55
query11	0.15	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.58	0.59
query14	0.77	0.76	0.77
query15	0.85	0.81	0.82
query16	0.36	0.35	0.35
query17	1.00	1.06	0.96
query18	0.21	0.22	0.26
query19	1.82	1.72	1.83
query20	0.02	0.01	0.01
query21	15.40	0.77	0.65
query22	4.22	7.42	1.66
query23	18.27	1.36	1.22
query24	2.09	0.22	0.24
query25	0.15	0.08	0.10
query26	0.29	0.21	0.21
query27	0.45	0.24	0.23
query28	13.25	1.02	0.99
query29	12.63	3.28	3.27
query30	0.26	0.06	0.05
query31	2.86	0.39	0.40
query32	3.24	0.47	0.46
query33	2.81	2.92	2.91
query34	17.01	4.32	4.33
query35	4.41	4.41	4.46
query36	0.66	0.46	0.47
query37	0.18	0.15	0.16
query38	0.14	0.14	0.15
query39	0.04	0.04	0.04
query40	0.15	0.12	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 109.44 s
Total hot run time: 30.24 s

@Lchangliang Lchangliang changed the title (cloud-merge) Support shadow tablet to do cumulative compaction in cloud mode [draft](cloud-merge) Support shadow tablet to do cumulative compaction in cloud mode Jul 5, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params) {
Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_convert_historical_rowsets' exceeds recommended size/complexity thresholds [readability-function-size]

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^
Additional context

be/src/cloud/cloud_schema_change_job.cpp:224: 172 lines including whitespace and comments (threshold 80)

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params) {
Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function '_convert_historical_rowsets' exceeds recommended size/complexity thresholds [readability-function-size]

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^
Additional context

be/src/cloud/cloud_schema_change_job.cpp:222: 173 lines including whitespace and comments (threshold 80)

Status CloudSchemaChangeJob::_convert_historical_rowsets(const SchemaChangeParams& sc_params,
                             ^

gensrc/proto/cloud.proto Outdated Show resolved Hide resolved
LOG.warn("tryTimes:{}, onCancel exception:", tryTimes, e);
}
sleepSeveralSeconds();
tryTimes++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it tries to abort the same tablet job multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ms will check the message <origin_idx, shadow_idx, origin_tablet, shadow_tablet>. If it's not match, skip the request.

((CloudInternalCatalog) Env.getCurrentInternalCatalog())
.removeSchemaChangeJob(dbId, tableId, baseIndexId, partitionId, baseTabletId);
}
}
Copy link
Contributor

@gavinchou gavinchou Jul 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log here:

  • which table/index has been processed (id name etc.)
  • how many talbets have been processed here.

and, what if it tries to abort the same tablet job multiple times?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ms will check the message <origin_idx, rollup_idx, origin_tablet, rollup_tablet>. If it's not match, skip the request.

if (recorded_job.has_schema_change() && request->action() == FinishTabletJobRequest::COMMIT &&
!check_compaction_input_verions(compaction, recorded_job)) {
SS << "Check compaction input versions failed in schema change. input_version_start="
<< compaction.input_versions(0) << " input_version_end=" << compaction.input_versions(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SS typo? it should be lowercase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define SS (ss << &FILE[get_file_name_offset(FILE)] << ":" << LINE << " ")

@Lchangliang
Copy link
Contributor Author

run buildall

@Lchangliang
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41877 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 65ae146ba1653ba96593821faf3b237766abdf8e, data reload: false

------ Round 1 ----------------------------------
q1	17653	4993	4110	4110
q2	2029	202	198	198
q3	10575	1407	1361	1361
q4	10270	845	934	845
q5	7629	2924	2985	2924
q6	220	140	141	140
q7	1042	621	611	611
q8	9438	1936	1951	1936
q9	8408	6658	6612	6612
q10	8746	3865	3858	3858
q11	440	244	250	244
q12	416	233	233	233
q13	17774	2974	2953	2953
q14	267	247	252	247
q15	531	484	500	484
q16	524	401	390	390
q17	975	948	940	940
q18	8055	7215	7211	7211
q19	1420	1225	1223	1223
q20	558	332	344	332
q21	5299	4853	4738	4738
q22	358	287	287	287
Total cold run time: 112627 ms
Total hot run time: 41877 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4087	4036	4058	4036
q2	332	230	225	225
q3	3031	3059	3145	3059
q4	2023	2040	1988	1988
q5	5632	5456	5471	5456
q6	221	135	133	133
q7	2109	1773	1855	1773
q8	3308	3372	3350	3350
q9	8686	8667	8761	8667
q10	3922	4077	3922	3922
q11	549	458	470	458
q12	779	604	608	604
q13	14768	3130	3142	3130
q14	299	268	288	268
q15	538	495	500	495
q16	460	438	398	398
q17	1774	1740	1762	1740
q18	8209	7714	7643	7643
q19	1724	1774	1749	1749
q20	2096	1876	1835	1835
q21	5743	5501	5254	5254
q22	519	476	463	463
Total cold run time: 70809 ms
Total hot run time: 56646 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 170147 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 65ae146ba1653ba96593821faf3b237766abdf8e, data reload: false

query1	920	372	367	367
query2	6474	1701	1660	1660
query3	6652	222	223	222
query4	20205	17526	17347	17347
query5	3666	517	539	517
query6	291	182	170	170
query7	4594	308	292	292
query8	249	198	206	198
query9	8536	2365	2347	2347
query10	444	285	269	269
query11	10495	10095	10039	10039
query12	120	86	83	83
query13	1637	369	370	369
query14	10007	7065	7667	7065
query15	205	162	168	162
query16	6864	432	445	432
query17	923	569	542	542
query18	1781	286	279	279
query19	197	140	145	140
query20	91	87	88	87
query21	196	104	108	104
query22	4232	4102	4187	4102
query23	33729	33634	33640	33634
query24	10267	3121	3112	3112
query25	664	416	407	407
query26	1742	158	160	158
query27	2795	289	277	277
query28	7423	2000	1995	1995
query29	1159	452	462	452
query30	236	157	161	157
query31	953	794	773	773
query32	109	61	58	58
query33	668	338	326	326
query34	941	497	513	497
query35	893	785	765	765
query36	1011	892	874	874
query37	262	78	78	78
query38	2899	2792	2784	2784
query39	868	819	854	819
query40	250	111	109	109
query41	47	45	46	45
query42	119	102	103	102
query43	479	423	415	415
query44	1186	719	716	716
query45	217	180	180	180
query46	1088	837	823	823
query47	1798	1727	1716	1716
query48	373	306	313	306
query49	936	420	438	420
query50	891	430	431	430
query51	6778	6588	6638	6588
query52	105	89	89	89
query53	263	179	183	179
query54	613	451	452	451
query55	79	75	77	75
query56	268	260	262	260
query57	1131	1025	1054	1025
query58	284	295	267	267
query59	2635	2388	2427	2388
query60	287	275	271	271
query61	95	95	97	95
query62	883	666	663	663
query63	218	188	176	176
query64	5719	1960	1876	1876
query65	3188	3160	3127	3127
query66	1302	327	326	326
query67	15180	14900	14952	14900
query68	4310	568	569	568
query69	467	308	311	308
query70	1107	1065	1044	1044
query71	439	280	274	274
query72	6977	2787	2509	2509
query73	763	327	330	327
query74	6011	5727	5678	5678
query75	3352	2763	2721	2721
query76	2750	1207	1282	1207
query77	416	303	315	303
query78	9394	8856	8892	8856
query79	1472	538	553	538
query80	1126	507	502	502
query81	566	227	222	222
query82	1027	135	128	128
query83	245	179	171	171
query84	276	80	77	77
query85	1337	320	297	297
query86	393	298	291	291
query87	3269	3115	3095	3095
query88	2910	2397	2394	2394
query89	397	297	288	288
query90	1747	189	184	184
query91	134	98	100	98
query92	60	49	49	49
query93	1409	633	632	632
query94	777	289	290	289
query95	382	262	264	262
query96	598	284	287	284
query97	3287	3099	3089	3089
query98	222	198	195	195
query99	1622	1290	1294	1290
Total cold run time: 261052 ms
Total hot run time: 170147 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 65ae146ba1653ba96593821faf3b237766abdf8e, data reload: false

query1	0.04	0.04	0.04
query2	0.07	0.04	0.03
query3	0.23	0.05	0.05
query4	1.67	0.07	0.06
query5	0.49	0.48	0.49
query6	1.13	0.72	0.72
query7	0.02	0.01	0.01
query8	0.06	0.05	0.05
query9	0.57	0.51	0.50
query10	0.57	0.56	0.58
query11	0.15	0.12	0.11
query12	0.14	0.12	0.12
query13	0.60	0.60	0.60
query14	0.77	0.80	0.78
query15	0.90	0.86	0.85
query16	0.35	0.36	0.37
query17	1.04	0.99	1.02
query18	0.23	0.21	0.21
query19	1.85	1.78	1.74
query20	0.01	0.01	0.01
query21	15.40	0.74	0.65
query22	4.01	6.74	1.94
query23	18.07	1.41	1.35
query24	2.25	0.22	0.22
query25	0.19	0.08	0.08
query26	0.32	0.21	0.22
query27	0.46	0.24	0.23
query28	13.16	1.01	0.96
query29	12.63	3.34	3.31
query30	0.26	0.06	0.06
query31	2.87	0.41	0.40
query32	3.25	0.50	0.48
query33	2.89	2.99	2.97
query34	15.44	4.29	4.25
query35	4.31	4.27	4.31
query36	0.69	0.50	0.49
query37	0.18	0.16	0.16
query38	0.17	0.14	0.15
query39	0.04	0.04	0.04
query40	0.16	0.13	0.13
query41	0.10	0.05	0.05
query42	0.05	0.05	0.05
query43	0.04	0.05	0.05
Total cold run time: 107.83 s
Total hot run time: 30.76 s

@Lchangliang
Copy link
Contributor Author

run p0

Copy link
Contributor

github-actions bot commented Aug 2, 2024

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 2, 2024
Copy link
Contributor

github-actions bot commented Aug 2, 2024

PR approved by at least one committer and no changes requested.

@gavinchou gavinchou merged commit b58b9e4 into apache:master Aug 2, 2024
29 of 30 checks passed
gavinchou added a commit to gavinchou/doris that referenced this pull request Aug 4, 2024
dataroaring pushed a commit that referenced this pull request Aug 4, 2024
…on in cloud mode (#37293)" (#38828)

We have to figure out why it causes a SEGV when running cloud_p0 later

```
#0 doris::cloud::TabletCompactionJobPB::_internal_input_versions(int) const
        /root/doris/cloud/../gensrc/build/gen_cpp/cloud.pb.h:48193:33
#1 doris::cloud::MetaServiceImpl::start_tablet_job(google::protobuf::RpcController*, doris::cloud::StartTabletJobRequest const*, doris::cloud::StartTabletJobResponse*, google::protobuf::Closure*)
        /root/doris/cloud/src/meta-service/meta_service_job.cpp:436:9
#2 void doris::cloud::MetaServiceProxy::call_impl<doris::cloud::StartTabletJobRequest, doris::cloud::StartTabletJobResponse>(void (doris::cloud::MetaService::*)(google::protobuf::RpcController*, doris::cloud::StartTabletJobRequest const*, doris::cloud::StartTabletJobResponse*, google::protobuf::Closure*), google::protobuf::RpcController*, doris::cloud::StartTabletJobRequest const*, doris::cloud::StartTabletJobResponse*, google::protobuf::Closure*)
        /root/doris/cloud/src/meta-service/meta_service.h:684:13
#3 doris::cloud::MetaServiceProxy::start_tablet_job(google::protobuf::RpcController*, doris::cloud::StartTabletJobRequest const*, doris::cloud::StartTabletJobResponse*, google::protobuf::Closure*)
        /root/doris/cloud/src/meta-service/meta_service.h:478:9
#4 doris::cloud::MetaService::CallMethod(google::protobuf::MethodDescriptor const*, google::protobuf::RpcController*, google::protobuf::Message const*, google::protobuf::Message*, google::protobuf::Closure*)
        /root/doris/gensrc/build/gen_cpp/cloud.pb.cc:0:7
```


This PR also add some FE log
TangSiyang2001 pushed a commit to TangSiyang2001/doris that referenced this pull request Aug 19, 2024
…oud mode (apache#37293)

In cloud mode, when do schema change, shadow tablet encounters -235
because it cant do cumulative compaction in the case of a large number
of loads. And it will prevents the user from continuing to loads.
Implementation details:
1. When start schema change, record the end convert rowset version
`alter_version` into SchemaChangeJob.
2. For origin tablet, only can do base compaction in [0,
`alter_version`] and do cumulative compaction in (`alter_version`, N].
can not do compaction across `alter_verison` such as compaction [a,
`alter_version` + n].
3. For shadow tablet, cannot do base compaction and and do cumulative
compaction in (`alter_version`, N].
4. When the schema change failed because FE or BE coredump, it will
retry. When retry the schema change, it will get the `alter_version`
from meta_serive, and continue to do it.
5. When finish the schema change job or cancel it, we need to clear the
schema change job. Before this pr, it will cover by next schema change.
TangSiyang2001 pushed a commit to TangSiyang2001/doris that referenced this pull request Aug 19, 2024
…oud mode (apache#37293)

In cloud mode, when do schema change, shadow tablet encounters -235
because it cant do cumulative compaction in the case of a large number
of loads. And it will prevents the user from continuing to loads.
Implementation details:
1. When start schema change, record the end convert rowset version
`alter_version` into SchemaChangeJob.
2. For origin tablet, only can do base compaction in [0,
`alter_version`] and do cumulative compaction in (`alter_version`, N].
can not do compaction across `alter_verison` such as compaction [a,
`alter_version` + n].
3. For shadow tablet, cannot do base compaction and and do cumulative
compaction in (`alter_version`, N].
4. When the schema change failed because FE or BE coredump, it will
retry. When retry the schema change, it will get the `alter_version`
from meta_serive, and continue to do it.
5. When finish the schema change job or cancel it, we need to clear the
schema change job. Before this pr, it will cover by next schema change.
gavinchou pushed a commit that referenced this pull request Aug 27, 2024
…mpaction during schema change in cloud mode (#39558)

## Proposed changes

In cloud mode, when do schema change, shadow tablet encounters -235
because it cant do cumulative compaction in the case of a large number
of loads. And it will prevents the user from continuing to loads.
Implementation details:
1. When start schema change, record the end convert rowset version
`alter_version` into SchemaChangeJob.
2. For origin tablet, only can do base compaction in [0,
`alter_version`] and do cumulative compaction in (`alter_version`, N].
can not do compaction across `alter_verison` such as compaction [a,
`alter_version` + n].
3. For shadow tablet, cannot do base compaction and and do cumulative
compaction in (`alter_version`, N].
4. When the schema change failed because FE or BE coredump, it will
retry. When retry the schema change, it will get the `alter_version`
from meta_serive, and continue to do it.
5. When finish the schema change job or cancel it, we need to clear the
schema change job. Before this pr, it will cover by next schema change.

co-author(main author): @Lchangliang 
original PR: #37293

---------

Co-authored-by: Lightman <[email protected]>
gavinchou pushed a commit that referenced this pull request Sep 11, 2024
…mpaction during schema change in cloud mode (#39558)

In cloud mode, when do schema change, shadow tablet encounters -235
because it cant do cumulative compaction in the case of a large number
of loads. And it will prevents the user from continuing to loads.
Implementation details:
1. When start schema change, record the end convert rowset version
`alter_version` into SchemaChangeJob.
2. For origin tablet, only can do base compaction in [0,
`alter_version`] and do cumulative compaction in (`alter_version`, N].
can not do compaction across `alter_verison` such as compaction [a,
`alter_version` + n].
3. For shadow tablet, cannot do base compaction and and do cumulative
compaction in (`alter_version`, N].
4. When the schema change failed because FE or BE coredump, it will
retry. When retry the schema change, it will get the `alter_version`
from meta_serive, and continue to do it.
5. When finish the schema change job or cancel it, we need to clear the
schema change job. Before this pr, it will cover by next schema change.

co-author(main author): @Lchangliang
original PR: #37293

---------

Co-authored-by: Lightman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. doing meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants