Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Function](exec) replace SipHash in function by XXHash #32919

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

HappenLee
Copy link
Contributor

Proposed changes

replace SipHash in function by XXHash

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@HappenLee
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38353 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 43842beb13a883866a3a85d66910cfe8cfbc1e7d, data reload: false

------ Round 1 ----------------------------------
q1	18062	4458	4282	4282
q2	2523	166	162	162
q3	11328	1157	1240	1157
q4	11261	745	768	745
q5	7549	3074	3053	3053
q6	210	128	127	127
q7	1073	624	585	585
q8	9411	2067	2032	2032
q9	7354	6622	7003	6622
q10	8443	3416	3547	3416
q11	438	226	224	224
q12	371	200	195	195
q13	17797	2846	2852	2846
q14	227	204	202	202
q15	506	475	464	464
q16	496	378	375	375
q17	952	527	629	527
q18	7171	6505	6448	6448
q19	1571	1484	1408	1408
q20	544	264	248	248
q21	3705	3020	2952	2952
q22	362	283	300	283
Total cold run time: 111354 ms
Total hot run time: 38353 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4171	4066	4076	4066
q2	322	227	223	223
q3	3009	2853	2853	2853
q4	1841	1536	1581	1536
q5	5289	5318	5326	5318
q6	193	115	117	115
q7	2248	1825	1858	1825
q8	3162	3289	3297	3289
q9	8647	8703	8681	8681
q10	3808	3802	3748	3748
q11	554	458	444	444
q12	731	584	542	542
q13	15368	2828	2879	2828
q14	278	258	262	258
q15	500	457	459	457
q16	469	434	432	432
q17	1736	1486	1488	1486
q18	7587	7077	7227	7077
q19	1621	1463	1497	1463
q20	1911	1775	1709	1709
q21	4670	4634	4910	4634
q22	515	463	457	457
Total cold run time: 68630 ms
Total hot run time: 53441 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.25% (8741/24795)
Line Coverage: 27.04% (71556/264664)
Region Coverage: 26.28% (37136/141287)
Branch Coverage: 23.18% (18988/81916)
Coverage Report: http://coverage.selectdb-in.cc/coverage/43842beb13a883866a3a85d66910cfe8cfbc1e7d_43842beb13a883866a3a85d66910cfe8cfbc1e7d/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 181953 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 43842beb13a883866a3a85d66910cfe8cfbc1e7d, data reload: false

query1	954	374	351	351
query2	6547	2011	1831	1831
query3	6702	210	209	209
query4	31847	21335	21342	21335
query5	4364	412	387	387
query6	274	185	185	185
query7	4637	296	293	293
query8	241	165	183	165
query9	9562	2324	2317	2317
query10	550	243	262	243
query11	17212	14205	14224	14205
query12	147	92	82	82
query13	1635	411	412	411
query14	10117	8158	7754	7754
query15	257	211	197	197
query16	8196	269	265	265
query17	1965	586	548	548
query18	2103	295	277	277
query19	344	153	160	153
query20	96	89	89	89
query21	203	128	136	128
query22	5025	4811	4740	4740
query23	33634	32888	32762	32762
query24	10743	2862	2907	2862
query25	621	394	384	384
query26	1420	157	155	155
query27	2997	337	352	337
query28	7461	1853	1879	1853
query29	895	620	626	620
query30	303	150	146	146
query31	948	709	738	709
query32	105	58	54	54
query33	764	244	247	244
query34	1087	495	487	487
query35	813	610	605	605
query36	1022	898	901	898
query37	128	63	65	63
query38	3563	3415	3406	3406
query39	1490	1410	1425	1410
query40	210	110	109	109
query41	48	46	45	45
query42	109	94	96	94
query43	485	441	465	441
query44	1207	729	728	728
query45	268	250	258	250
query46	1103	698	687	687
query47	1892	1844	1843	1843
query48	442	346	350	346
query49	1119	334	354	334
query50	747	368	361	361
query51	6905	6731	6750	6731
query52	108	91	88	88
query53	350	274	275	274
query54	293	240	238	238
query55	85	77	84	77
query56	243	221	223	221
query57	1218	1142	1132	1132
query58	227	212	214	212
query59	2784	2767	2807	2767
query60	263	246	240	240
query61	96	108	113	108
query62	669	466	448	448
query63	304	283	273	273
query64	5857	4150	3940	3940
query65	3127	3019	3038	3019
query66	1452	385	377	377
query67	15294	14887	14914	14887
query68	5319	518	519	518
query69	580	386	376	376
query70	1210	1135	1165	1135
query71	406	277	287	277
query72	6332	2861	2670	2670
query73	730	327	317	317
query74	7693	6425	6328	6328
query75	2998	2230	2193	2193
query76	3546	913	893	893
query77	386	266	263	263
query78	10973	10163	10134	10134
query79	8932	516	524	516
query80	1565	377	376	376
query81	545	215	221	215
query82	971	86	83	83
query83	232	145	141	141
query84	285	81	78	78
query85	1519	324	309	309
query86	469	290	287	287
query87	3774	3532	3560	3532
query88	5135	2314	2305	2305
query89	528	366	361	361
query90	2016	179	175	175
query91	166	136	136	136
query92	67	49	49	49
query93	7055	500	488	488
query94	1173	179	177	177
query95	424	330	330	330
query96	614	269	269	269
query97	2718	2502	2461	2461
query98	229	215	205	205
query99	1260	926	940	926
Total cold run time: 307727 ms
Total hot run time: 181953 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 43842beb13a883866a3a85d66910cfe8cfbc1e7d with default session variables
Stream load json:         20 seconds loaded 2358488459 Bytes, about 112 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       14.7 seconds inserted 10000000 Rows, about 680K ops/s

Copy link
Contributor

PR approved by anyone and no changes requested.

@HappenLee HappenLee merged commit 0077b94 into apache:master Mar 28, 2024
27 of 31 checks passed
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 28, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <[email protected]>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <[email protected]>
Co-authored-by: Gavin Chou <[email protected]>
Co-authored-by: xy720 <[email protected]>
Co-authored-by: yongjinhou <[email protected]>
Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: stephen <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: seawinde <[email protected]>
Co-authored-by: lihangyu <[email protected]>
Co-authored-by: Yulei-Yang <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: Xinyi Zou <[email protected]>
Co-authored-by: Vallish Pai <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: Jensen <[email protected]>
Co-authored-by: zhangdong <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: jakevin <[email protected]>
Co-authored-by: Mryange <[email protected]>
Co-authored-by: zclllyybb <[email protected]>
Co-authored-by: Tiewei Fang <[email protected]>
Co-authored-by: Xin Liao <[email protected]>
@yiguolei yiguolei mentioned this pull request Apr 26, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.x reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants