Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt](Serde) optimize serialization to string on variant type #43237

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Nov 5, 2024

  1. avoid sanitize type each time serialization one row
  2. use type id to compare instead of compare type name
    image

select count(cast(payload["issue"] as string)) from gharchive

before 101s
after 15s

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Check List (For Committer)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No colde files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.
  • Release note

    None

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

1. avoid sanitize type each time serialization one row
2. use type id to compare instead of compare type name
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

github-actions bot commented Nov 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.80% (9822/25982)
Line Coverage: 28.98% (81700/281943)
Region Coverage: 28.22% (42111/149246)
Branch Coverage: 24.80% (21367/86168)
Coverage Report: http://coverage.selectdb-in.cc/coverage/fb58170e7ebb9c704c113c863ca45a545615eede_fb58170e7ebb9c704c113c863ca45a545615eede/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41446 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fb58170e7ebb9c704c113c863ca45a545615eede, data reload: false

------ Round 1 ----------------------------------
q1	17585	7506	7322	7322
q2	2063	173	159	159
q3	10591	1068	1175	1068
q4	10593	874	879	874
q5	7766	3085	3121	3085
q6	234	147	143	143
q7	1023	613	620	613
q8	9363	1991	2051	1991
q9	6643	6493	6505	6493
q10	7131	2390	2445	2390
q11	468	249	251	249
q12	400	220	213	213
q13	17783	3010	2980	2980
q14	248	222	207	207
q15	584	534	502	502
q16	655	589	598	589
q17	982	548	586	548
q18	7444	6821	6753	6753
q19	1329	1089	1069	1069
q20	480	179	184	179
q21	4072	3199	3035	3035
q22	1106	984	1004	984
Total cold run time: 108543 ms
Total hot run time: 41446 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7315	7306	7318	7306
q2	338	233	219	219
q3	3069	2973	2964	2964
q4	2107	1809	1794	1794
q5	5762	5824	5879	5824
q6	227	144	137	137
q7	2315	1834	1807	1807
q8	3419	3485	3529	3485
q9	8982	8982	8935	8935
q10	3601	3615	3594	3594
q11	598	502	506	502
q12	845	655	638	638
q13	9988	3224	3239	3224
q14	315	301	267	267
q15	585	529	532	529
q16	674	649	661	649
q17	1862	1668	1597	1597
q18	8313	7726	7782	7726
q19	1738	1538	1568	1538
q20	2095	1881	1861	1861
q21	5575	5479	5608	5479
q22	1142	1060	1047	1047
Total cold run time: 70865 ms
Total hot run time: 61122 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196848 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fb58170e7ebb9c704c113c863ca45a545615eede, data reload: false

query1	1219	746	743	743
query2	6228	2074	2024	2024
query3	10814	3997	4081	3997
query4	67767	29538	23723	23723
query5	4863	445	434	434
query6	407	167	165	165
query7	5623	296	286	286
query8	297	229	236	229
query9	9007	2670	2656	2656
query10	455	264	240	240
query11	17440	15315	15713	15315
query12	154	116	106	106
query13	1624	428	422	422
query14	10780	7452	7438	7438
query15	199	179	170	170
query16	7036	473	426	426
query17	1105	548	567	548
query18	1835	307	306	306
query19	196	154	146	146
query20	112	108	105	105
query21	199	98	100	98
query22	4619	4430	4542	4430
query23	34648	34150	34168	34150
query24	5967	2818	2748	2748
query25	507	406	405	405
query26	647	156	154	154
query27	1698	282	282	282
query28	4376	2462	2425	2425
query29	687	446	453	446
query30	229	164	155	155
query31	991	765	833	765
query32	66	56	58	56
query33	429	274	274	274
query34	912	520	515	515
query35	844	762	734	734
query36	1084	950	950	950
query37	114	71	81	71
query38	4387	4255	4360	4255
query39	1484	1422	1439	1422
query40	207	100	101	100
query41	49	46	47	46
query42	112	98	97	97
query43	531	493	504	493
query44	1191	828	822	822
query45	183	165	166	165
query46	1118	693	714	693
query47	1956	1867	1847	1847
query48	415	324	322	322
query49	745	397	423	397
query50	819	394	388	388
query51	7235	7237	6993	6993
query52	98	90	89	89
query53	250	179	178	178
query54	515	429	391	391
query55	77	76	73	73
query56	237	228	233	228
query57	1297	1200	1180	1180
query58	207	200	201	200
query59	3205	3112	3176	3112
query60	279	241	240	240
query61	102	108	107	107
query62	791	683	657	657
query63	213	183	179	179
query64	1392	666	626	626
query65	3273	3216	3200	3200
query66	720	325	309	309
query67	16012	15818	15709	15709
query68	3285	583	562	562
query69	419	261	257	257
query70	1207	1161	1145	1145
query71	355	249	249	249
query72	6215	4187	4021	4021
query73	756	364	361	361
query74	10096	9063	9038	9038
query75	3376	2668	2673	2668
query76	1803	1101	1072	1072
query77	486	277	264	264
query78	10454	9494	9404	9404
query79	1181	588	590	588
query80	819	414	420	414
query81	508	245	235	235
query82	1311	114	114	114
query83	204	138	145	138
query84	283	76	70	70
query85	873	308	306	306
query86	339	301	302	301
query87	4818	4707	4784	4707
query88	3416	2184	2142	2142
query89	418	281	285	281
query90	2011	185	183	183
query91	144	104	103	103
query92	65	49	50	49
query93	1248	539	553	539
query94	826	297	290	290
query95	349	241	252	241
query96	604	276	284	276
query97	2881	2725	2684	2684
query98	208	197	197	197
query99	1579	1294	1314	1294
Total cold run time: 316815 ms
Total hot run time: 196848 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fb58170e7ebb9c704c113c863ca45a545615eede, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.06	0.06
query4	1.64	0.10	0.11
query5	0.42	0.41	0.40
query6	1.17	0.65	0.65
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.56	0.48	0.50
query10	0.56	0.54	0.54
query11	0.13	0.10	0.11
query12	0.13	0.11	0.11
query13	0.61	0.59	0.60
query14	2.72	2.72	2.74
query15	0.92	0.83	0.84
query16	0.39	0.38	0.37
query17	1.05	1.05	1.03
query18	0.20	0.20	0.21
query19	1.87	1.86	1.82
query20	0.01	0.01	0.02
query21	15.36	0.59	0.57
query22	2.16	2.51	2.18
query23	17.10	1.01	0.85
query24	2.92	0.98	2.55
query25	0.25	0.11	0.11
query26	0.57	0.14	0.14
query27	0.03	0.05	0.04
query28	9.55	1.10	1.06
query29	12.55	3.32	3.30
query30	0.25	0.06	0.06
query31	2.86	0.37	0.37
query32	3.30	0.45	0.45
query33	3.00	3.06	3.04
query34	17.09	4.48	4.51
query35	4.60	4.51	4.50
query36	0.66	0.50	0.49
query37	0.09	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.03	0.02
query40	0.15	0.12	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 105.49 s
Total hot run time: 33.04 s

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 6, 2024
Copy link
Contributor

github-actions bot commented Nov 6, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Nov 6, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@amorynan amorynan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon eldenmoon merged commit 158e6de into apache:master Nov 6, 2024
33 of 36 checks passed
@eldenmoon eldenmoon deleted the op-var-to_string branch November 6, 2024 05:25
github-actions bot pushed a commit that referenced this pull request Nov 6, 2024
1. avoid sanitize type each time serialization one row
2. use type id to compare instead of compare type name

![image](https://github.com/user-attachments/assets/ad056c73-8a50-49c9-a670-4750b9609675)

`select count(cast(payload["issue"] as string))  from gharchive`

before 101s
after 15s
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Nov 6, 2024
…#43237)

1. avoid sanitize type each time serialization one row
2. use type id to compare instead of compare type name

![image](https://github.com/user-attachments/assets/ad056c73-8a50-49c9-a670-4750b9609675)

`select count(cast(payload["issue"] as string))  from gharchive`

before 101s
after 15s
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Nov 6, 2024
…#43237)

1. avoid sanitize type each time serialization one row
2. use type id to compare instead of compare type name

![image](https://github.com/user-attachments/assets/ad056c73-8a50-49c9-a670-4750b9609675)

`select count(cast(payload["issue"] as string))  from gharchive`

before 101s
after 15s
eldenmoon pushed a commit that referenced this pull request Nov 7, 2024
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Nov 7, 2024
…#43237)

1. avoid sanitize type each time serialization one row
2. use type id to compare instead of compare type name

![image](https://github.com/user-attachments/assets/ad056c73-8a50-49c9-a670-4750b9609675)

`select count(cast(payload["issue"] as string))  from gharchive`

before 101s
after 15s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants