Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](spill) SpillStream's writer may not have been finalized #32931

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Mar 27, 2024

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@mrhhsg mrhhsg changed the title [fix](spill) SpillStream's writer maybe may not have been finalized [fix](spill) SpillStream's writer may not have been finalized Mar 27, 2024
@mrhhsg
Copy link
Member Author

mrhhsg commented Mar 27, 2024

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.24% (8738/24796)
Line Coverage: 27.02% (71530/264778)
Region Coverage: 26.27% (37117/141316)
Branch Coverage: 23.16% (18975/81936)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2bee6e6706ef4d25845d0f1d282c9959cb873b1c_2bee6e6706ef4d25845d0f1d282c9959cb873b1c/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 37796 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2bee6e6706ef4d25845d0f1d282c9959cb873b1c, data reload: false

------ Round 1 ----------------------------------
q1	17611	4212	4094	4094
q2	2106	152	154	152
q3	10591	1089	1182	1089
q4	10227	809	760	760
q5	7457	3059	2994	2994
q6	199	124	119	119
q7	1026	563	562	562
q8	9344	1998	1987	1987
q9	7262	6568	6548	6548
q10	8426	3510	3549	3510
q11	433	224	221	221
q12	397	196	201	196
q13	17794	2835	2879	2835
q14	247	205	210	205
q15	521	472	467	467
q16	497	375	374	374
q17	949	512	617	512
q18	7130	6313	6338	6313
q19	4643	1440	1482	1440
q20	549	265	260	260
q21	3602	2931	2859	2859
q22	345	299	300	299
Total cold run time: 111356 ms
Total hot run time: 37796 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4143	4075	4103	4075
q2	331	230	227	227
q3	2959	2841	2854	2841
q4	1766	1589	1567	1567
q5	5274	5325	5308	5308
q6	191	116	114	114
q7	2210	1845	1851	1845
q8	3132	3283	3285	3283
q9	8664	8659	8671	8659
q10	3838	3775	3761	3761
q11	537	438	442	438
q12	706	580	585	580
q13	16934	2838	2846	2838
q14	284	249	256	249
q15	492	458	468	458
q16	459	417	420	417
q17	1720	1508	1464	1464
q18	7397	7175	7064	7064
q19	1620	1503	1532	1503
q20	1896	1751	1710	1710
q21	4867	4636	4675	4636
q22	534	452	473	452
Total cold run time: 69954 ms
Total hot run time: 53489 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182356 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2bee6e6706ef4d25845d0f1d282c9959cb873b1c, data reload: false

query1	945	367	363	363
query2	6562	2079	1913	1913
query3	6708	207	211	207
query4	31838	21326	21583	21326
query5	4266	393	387	387
query6	279	176	169	169
query7	4638	293	292	292
query8	230	177	168	168
query9	9177	2311	2293	2293
query10	564	246	260	246
query11	15430	14232	14350	14232
query12	139	90	89	89
query13	1623	414	409	409
query14	9715	7942	8012	7942
query15	322	197	203	197
query16	8248	272	265	265
query17	2099	572	542	542
query18	2118	291	284	284
query19	361	155	158	155
query20	93	89	89	89
query21	206	127	134	127
query22	5024	4808	4827	4808
query23	34232	32992	32849	32849
query24	11578	2862	2943	2862
query25	650	392	387	387
query26	1736	154	154	154
query27	2984	360	351	351
query28	7662	1901	1884	1884
query29	999	654	631	631
query30	312	149	150	149
query31	945	739	743	739
query32	95	57	56	56
query33	774	252	253	252
query34	1072	473	488	473
query35	840	618	620	618
query36	1026	870	892	870
query37	159	63	64	63
query38	3601	3489	3422	3422
query39	1488	1436	1437	1436
query40	287	115	121	115
query41	51	47	47	47
query42	100	98	96	96
query43	467	442	453	442
query44	1161	740	724	724
query45	285	250	268	250
query46	1113	687	707	687
query47	1913	1837	1852	1837
query48	441	358	364	358
query49	1230	346	337	337
query50	757	376	366	366
query51	6730	6662	6638	6638
query52	106	91	87	87
query53	346	274	278	274
query54	331	242	257	242
query55	85	79	82	79
query56	248	225	232	225
query57	1219	1133	1150	1133
query58	240	212	212	212
query59	2802	2650	2563	2563
query60	271	244	245	244
query61	132	116	122	116
query62	654	449	454	449
query63	304	275	275	275
query64	6417	3991	4054	3991
query65	3049	3005	3033	3005
query66	1432	381	357	357
query67	15435	15035	14902	14902
query68	8921	536	536	536
query69	625	385	369	369
query70	1313	1188	1134	1134
query71	525	267	258	258
query72	6339	2709	2548	2548
query73	836	313	309	309
query74	7850	6400	6441	6400
query75	3900	2217	2260	2217
query76	5463	985	942	942
query77	663	260	251	251
query78	11042	10218	10204	10204
query79	9181	524	516	516
query80	1503	378	375	375
query81	511	216	224	216
query82	251	89	88	88
query83	211	149	145	145
query84	287	75	78	75
query85	1137	345	318	318
query86	357	308	299	299
query87	3783	3597	3548	3548
query88	4633	2316	2304	2304
query89	474	382	365	365
query90	2201	176	175	175
query91	170	137	141	137
query92	65	49	47	47
query93	6212	501	484	484
query94	1363	178	173	173
query95	435	336	332	332
query96	615	276	270	270
query97	2664	2505	2494	2494
query98	237	219	205	205
query99	1178	911	872	872
Total cold run time: 313289 ms
Total hot run time: 182356 ms

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 2bee6e6706ef4d25845d0f1d282c9959cb873b1c with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.6 seconds inserted 10000000 Rows, about 735K ops/s

@@ -170,6 +170,9 @@ Status PartitionedAggSinkOperatorX::sink(doris::RuntimeState* state, vectorized:
if (revocable_mem_size(state) > 0) {
RETURN_IF_ERROR(revoke_memory(state));
} else {
for (auto& partition : local_state._shared_state->spill_partitions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some problems:

  1. this API will write disk, it may blocking pipeline thread.
  2. If eos and revoable mem size == 0, why do we need finish current spilling, since there are no data.

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 28, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit ceef19a into apache:master Mar 28, 2024
29 of 34 checks passed
Jibing-Li added a commit that referenced this pull request Mar 29, 2024
* [fix](merge cloud) Fix cloud be set be tag map (#32864)

* [chore] Add gavinchou to collaborators (#32881)

* [chore](show) support statement to show views from table (#32358)

MySQL [test]> show views;
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
| t2_view        |
+----------------+
2 rows in set (0.00 sec)

MySQL [test]> show views like '%t1%';
+----------------+
| Tables_in_test |
+----------------+
| t1_view        |
+----------------+
1 row in set (0.01 sec)

MySQL [test]> show views where create_time > '2024-03-18';
+----------------+
| Tables_in_test |
+----------------+
| t2_view        |
+----------------+
1 row in set (0.02 sec)

* [Enhancement](ranger) Disable some permission operations when Ranger or LDAP are enabled (#32538)

Disable some permission operations when Ranger or LDAP are enabled.

* [chore](ci) exclude unstable trino_connector case (#32892)

Co-authored-by: stephen <[email protected]>

* [fix](Nereids) NPE when create table with implicit index type (#32893)

* [improvement](mtmv) Support more join types for query rewriting by materialized view (#32685)

This pattern of rewriting is supported for multi-table joins and supported join types is as following:

INNER JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
FULL OUTER JOIN
LEFT SEMI JOIN
RIGHT SEMI JOIN
LEFT ANTI JOIN
RIGHT ANTI JOIN

* [Serde](Variant) support arrow serialization for varint type (#32780)

* [fix](multicatalog) fix no data error when read hive table on cosn (#32815)

Currently, when reading a hive on cosn table, doris return empty result, but the table has data.
iceberg on cosn is ok.
The reason is misuse of cosn's file sytem. according to cosn's doc, its fs.cosn.impl should be org.apache.hadoop.fs.CosFileSystem

* [fix](nereids)EliminateGroupByConstant should replace agg's output after removing constant group by keys (#32878)

* [Fix](executor)Fix regression test for test_active_queries/test_backend_active_tasks #32899

* [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)

1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.

* [revert](jni) revert part of #32455 (#32904)

* [fix](spill) Avoid releasing resources while spill tasks are executing (#32783)

* [chore](log) print query id before logging profile in be.INFO (#32922)

* [fix](grace-exit) Stop incorrectly of reportwork cause heap use after free #32929

* [improvement](decommission be) decommission check replica num (#32748)

* [fix](arrow-flight) Fix reach limit of connections error (#32911)

Fix Reach limit of connections error
in fe.conf , arrow_flight_token_cache_size is mandatory less than qe_max_connection/2. arrow flight sql is a stateless protocol, connection is usually not actively disconnected, bearer token is evict from the cache will unregister ConnectContext.

Fix ConnectContext.command not be reset to COM_SLEEP in time, this will result in frequent kill connection after query timeout.

Fix bearer token evict log and exception.

TODO: use arrow flight session: https://mail.google.com/mail/u/0/#inbox/FMfcgzGxRdxBLQLTcvvtRpqsvmhrHpdH

* [bugfix](cloud) few variable not initialized (#32868)

../../cloud/src/recycler/meta_checker.cpp
can cause uninitialised memory read.

* [fix](arrow-flight) Fix arrow flight sql compatible with JDK 17 and upgrade arrow 15.0.2 (#32796)

--add-opens=java.base/java.nio=ALL-UNNAMED, see: https://arrow.apache.org/docs/java/install.html#java-compatibility
groovy use flight sql connection to execute query SUM(MAX(c1) OVER (PARTITION BY)) report error: AGGREGATE clause must not contain analytic expressions, but no problem in Java execute it with jdbc::arrow-flight-sql.
groovy not support print arrow array type, throw IndexOutOfBoundsException.
"arrow_flight_sql" not support two phase read
./run-regression-test.sh --run --clean -g arrow_flight_sql

* [fix](spill) SpillStream's writer maybe may not have been finalized (#32931)

* [improvement](spill) Disable DistinctStreamingAgg when spill is enabled (#32932)

* [Improve](inverted_index) update clucene and improve array inverted index writer  (#32436)

* [Performance](exec) replace SipHash in function by XXHash (#32919)

* [feature](agg) add aggregate function sum0 (#32541)

* [improvement](mtmv) Support to get tables in materialized view when collecting table in plan (#32797)

Support to get tables in materialized view when collecting table in plan

table scehma as fllowing:

create materialized view mv1
BUILD IMMEDIATE REFRESH COMPLETE ON MANUAL
DISTRIBUTED BY RANDOM BUCKETS 1 
PROPERTIES ('replication_num' = '1')
 as 
select 
  t1.c1, 
  t3.c2 
from 
  table1 t1 
  inner join table3 t3 on t1.c1 = t3.c2

if get table from the plan as follwoing, we can get [table1, table3, table2], the mv1 is expanded to get base tables;

SELECT 
  mv1.*, 
  uuid() 
FROM 
  mv1 LEFT SEMI 
  JOIN table2 ON mv1.c1 = table2.c1 
WHERE 
  mv1.c1 IN (
    SELECT 
      c1 
    FROM 
      table2
  ) 
  OR mv1.c1 < 10

* [enhance](mtmv)support olap table partition column is null (#32698)

* [enhancement](cloud) add table version to cloud (#32738)

Add table version to cloud.

In Fe:
Get: If Fe is cloud mode, get table version from meta service.
Update: Op drop/replace temp partition, commit transaction.

In meta service:
Add: create Index. init value is 1.
Remove: by recycler.
Update: commit/drop partition rpc, commit txn rpc. Atomic++.

* [fix](cloud) schema change from not null to null (#32913)

1. Use equals instead of == for type comparing
2. null bitmap size is reisze by size of ref column.

* [feature](Nereids): add ColumnPruningPostProcessor. (#32800)

* [case](rowpolicy)fix row policy has been exist (#32880)

* [fix](pipeline) fix use error row desc when origin block clear (#32803)

* [fix](Nereids) support variant column with index when create table (#32948)

* [opt](Nereids) support create table with variant type (#32953)

* [test](insert-overwrite) Add insert overwrite auto detect concurrency cases (#32935)

* [fix](compile) fe cannot compile in idea (#32955)

* [enhancement](plsql) Support select * from routines (#32866)

Support show of plsql procedure using select * from routines.

* [fix](trino-connector) fix `NoClassDefFoundError` of hudi `Utils` class (#32846)

Due to the change of this PR #32455 , the `trino-connector-scanner` package cannot access the `hudi_scanner` package, so the exception NoclassDeffounderror will appear.

We need to write a separate Utils class.

* [exec](column) change some complex column move to noexcept (#32954)

* [Enhancement](data skew) extends show data skew (#32732)

* [chore](test) let suite compatible with Nereids (#32964)

* Support identical column name in different index. (#32792)

* Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470)

* [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961)

* [improvement](executor)Add tag property for workload group #32874

* [fix](auth)unified workload and resource permission logic (#32907)

- `Grant resource` can no longer grant global `usage_priv`
-  `grant resource %` instead of `grant resource *`

before change:
```
grant usage_priv on resource * to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: Usage_priv 
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: NULL
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 
```
after change
```
grant usage_priv on resource '%' to f;
show grants for f\G
*************************** 1. row ***************************
      UserIdentity: 'f'@'%'
           Comment: 
          Password: No
             Roles: 
       GlobalPrivs: NULL
      CatalogPrivs: NULL
     DatabasePrivs: internal.information_schema: Select_priv ; internal.mysql: Select_priv 
        TablePrivs: NULL
          ColPrivs: NULL
     ResourcePrivs: %: Usage_priv 
 CloudClusterPrivs: NULL
WorkloadGroupPrivs: normal: Usage_priv 

```

---------

Co-authored-by: yujun <[email protected]>
Co-authored-by: Gavin Chou <[email protected]>
Co-authored-by: xy720 <[email protected]>
Co-authored-by: yongjinhou <[email protected]>
Co-authored-by: Dongyang Li <[email protected]>
Co-authored-by: stephen <[email protected]>
Co-authored-by: morrySnow <[email protected]>
Co-authored-by: seawinde <[email protected]>
Co-authored-by: lihangyu <[email protected]>
Co-authored-by: Yulei-Yang <[email protected]>
Co-authored-by: starocean999 <[email protected]>
Co-authored-by: wangbo <[email protected]>
Co-authored-by: Mingyu Chen <[email protected]>
Co-authored-by: Jerry Hu <[email protected]>
Co-authored-by: zhiqiang <[email protected]>
Co-authored-by: Xinyi Zou <[email protected]>
Co-authored-by: Vallish Pai <[email protected]>
Co-authored-by: amory <[email protected]>
Co-authored-by: HappenLee <[email protected]>
Co-authored-by: Jensen <[email protected]>
Co-authored-by: zhangdong <[email protected]>
Co-authored-by: Yongqiang YANG <[email protected]>
Co-authored-by: jakevin <[email protected]>
Co-authored-by: Mryange <[email protected]>
Co-authored-by: zclllyybb <[email protected]>
Co-authored-by: Tiewei Fang <[email protected]>
Co-authored-by: Xin Liao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants