-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support DDL/DML on external catalog #31442
Labels
kind/feature
Categorizes issue or PR as related to a new feature.
Comments
morningman
added
the
kind/feature
Categorizes issue or PR as related to a new feature.
label
Feb 27, 2024
i want try |
I'd like to try |
We are working on the framework of this feature. |
This was referenced Mar 15, 2024
morningman
pushed a commit
that referenced
this issue
Mar 15, 2024
add hive table sink thrift add partition type add new compress type issue: #31442
This was referenced Mar 17, 2024
morningman
pushed a commit
to morningman/doris
that referenced
this issue
Mar 18, 2024
add hive table sink thrift add partition type add new compress type issue: apache#31442
morningman
pushed a commit
that referenced
this issue
Mar 18, 2024
morningman
pushed a commit
that referenced
this issue
Mar 20, 2024
…#32458) issue: #31442 1. adapt create table statement from doris to hive 2. fix insert overwrite for table sink > The doris create hive table statement: ``` mysql> CREATE TABLE buck2( -> id int COMMENT 'col1', -> name string COMMENT 'col2', -> dt string COMMENT 'part1', -> dtm string COMMENT 'part2' -> ) ENGINE=hive -> COMMENT "create tbl" -> PARTITION BY LIST (dt, dtm) () -> DISTRIBUTED BY HASH (id) BUCKETS 16 -> PROPERTIES( -> "file_format" = "orc" -> ); ``` > generated hive create table statement: ``` CREATE TABLE `buck2`( `id` int COMMENT 'col1', `name` string COMMENT 'col2') PARTITIONED BY ( `dt` string, `dtm` string) CLUSTERED BY ( id) INTO 16 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://HDFS8000871/usr/hive/warehouse/jz3.db/buck2' TBLPROPERTIES ( 'transient_lastDdlTime'='1710840747', 'doris.file_format'='orc') ```
seawinde
pushed a commit
to seawinde/doris
that referenced
this issue
Mar 20, 2024
…ache#32441) issue: apache#31442 1. Get updated information from coordinate and commit 2. refresh table after commit
This was referenced Mar 21, 2024
Merged
morningman
pushed a commit
that referenced
this issue
Mar 22, 2024
issue: #31442 fix hive table sink write path to hdfs://${hdfs_root}/tmp/.doris_staging/${user}
morningman
pushed a commit
that referenced
this issue
Mar 22, 2024
support insert overwrite for unpartitioned table and partitioned table. issue: #31442
yiguolei
pushed a commit
that referenced
this issue
Mar 24, 2024
support insert overwrite for unpartitioned table and partitioned table. issue: #31442
morningman
pushed a commit
that referenced
this issue
Mar 25, 2024
…ed by refactoring and add hive writing regression test. (#32721) Issue Number: #31442 - Fix the issue of not initializing the writer caused by refactoring code in #31716. - Fix reference lifetime issue of `TParquetVersion::type parquet_version` in `VParquetTransformer ` when using temp object. - Add hive writing regression tests.
morningman
pushed a commit
that referenced
this issue
May 27, 2024
Issue: #31442 ``` /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215 5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164 6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209 7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184 8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const at ```
dataroaring
pushed a commit
that referenced
this issue
May 27, 2024
Issue: #31442 ``` /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215 5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164 6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209 7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184 8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const at ```
yiguolei
pushed a commit
that referenced
this issue
May 27, 2024
Issue: #31442 ``` /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215 5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164 6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209 7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184 8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const at ```
seawinde
pushed a commit
to seawinde/doris
that referenced
this issue
May 27, 2024
Issue: apache#31442 ``` /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215 5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164 6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209 7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184 8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const at ```
morningman
pushed a commit
that referenced
this issue
May 28, 2024
wuwenchi
added a commit
to wuwenchi/doris_new
that referenced
this issue
May 29, 2024
apache#31442 test in apache#34929 When null value is used as the partition value, BE will return the "null" string, so this string needs to be processed specially. (cherry picked from commit d86cd1b)
yiguolei
pushed a commit
that referenced
this issue
May 29, 2024
dataroaring
pushed a commit
that referenced
this issue
May 31, 2024
yiguolei
pushed a commit
that referenced
this issue
May 31, 2024
…es when writing to s3. (#35645) ## Proposed changes Issue Number: close #31442 (Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3. ### Error msg ``` org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2- ``` ### Root Cause Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI. ### Solution The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.
yiguolei
pushed a commit
that referenced
this issue
May 31, 2024
…es when writing to s3. (#35645) ## Proposed changes Issue Number: close #31442 (Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3. ### Error msg ``` org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2- ``` ### Root Cause Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI. ### Solution The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.
dataroaring
pushed a commit
that referenced
this issue
Jun 4, 2024
…es when writing to s3. (#35645) ## Proposed changes Issue Number: close #31442 (Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3. ### Error msg ``` org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2- ``` ### Root Cause Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI. ### Solution The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.
seawinde
pushed a commit
to seawinde/doris
that referenced
this issue
Jun 5, 2024
…pache#35708) ## Proposed changes Issue apache#31442 <!--Describe your changes.--> 1. The unit of the seventh parameter of `ZonedDateTime.of` is nanosecond, so we should multiply the microsecond by 1000. 2. When writing to a non-partitioned iceberg table, the data path has an extra slash
seawinde
pushed a commit
to seawinde/doris
that referenced
this issue
Jun 5, 2024
…es when writing to s3. (apache#35645) ## Proposed changes Issue Number: close apache#31442 (Fix) [hive-writer] Fixed the issue when partition values contain spaces when writing to s3. ### Error msg ``` org.apache.doris.common.UserException: errCode = 2, detailMessage = java.net.URISyntaxException: Illegal character in path at index 114: oss://xxxxxxxxxxx/hive/tpcds1000_partition_oss/call_center/cc_call_center_sk=1/cc_mkt_class=A bit narrow forms matter animals. Consist/cc_market_manager=Daniel Weller/cc_rec_end_date=2001-12-31/f6b5ff4253414b06-9fd365ef68e5ddc5_133f02fb-a7e0-4109-9100-fb748a28259e-0.zlib.orc at org.apache.doris.common.util.S3URI.validateUri(S3URI.java:134) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.parseUri(S3URI.java:120) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.<init>(S3URI.java:116) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.common.util.S3URI.create(S3URI.java:108) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.obj.S3ObjStorage.deleteObject(S3ObjStorage.java:194) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.ObjFileSystem.delete(ObjFileSystem.java:150) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.fs.remote.SwitchingFileSystem.delete(SwitchingFileSystem.java:92) ~[doris-fe.jar:1.2- ``` ### Root Cause Hadoop partition names will encode some special characters, but not space characters, which is different from URI encoding. Therefore, an error will be reported when constructing URI. ### Solution The solution is to use regular expressions to parse URI, and then pass in each part of URI to construct URI. This URI constructor will encode each part of URI.
morningman
pushed a commit
that referenced
this issue
Jun 22, 2024
…#36289) #31442 Added iceberg operator function to support direct entry into the lake by doris 1. Support insert into data to iceberg by appending hdfs files 2. Implement iceberg partition routing through partitionTransform 2.1) Serialize spec and schema data into json on the fe side and then deserialize on the be side to get the schema and partition information of iceberg table 2.2) Then implement Iceberg's Identity, Bucket, Year/Month/Day and other types of partition strategies through partitionTransform and template class 3. Transaction management through IcebergTransaction 3.1) After the be side file is written, report CommitData data to fe according to the partition granularity 3.2) After receiving CommitData data, fe submits metadata to iceberg in IcebergTransaction ### Future work - Add unit test for partition transform function. - Implement partition transform function with exchange sink turned on. - The partition transform function omits the processing of bigint type. --------- Co-authored-by: lik40 <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
dataroaring
pushed a commit
that referenced
this issue
Jun 26, 2024
…#36289) #31442 Added iceberg operator function to support direct entry into the lake by doris 1. Support insert into data to iceberg by appending hdfs files 2. Implement iceberg partition routing through partitionTransform 2.1) Serialize spec and schema data into json on the fe side and then deserialize on the be side to get the schema and partition information of iceberg table 2.2) Then implement Iceberg's Identity, Bucket, Year/Month/Day and other types of partition strategies through partitionTransform and template class 3. Transaction management through IcebergTransaction 3.1) After the be side file is written, report CommitData data to fe according to the partition granularity 3.2) After receiving CommitData data, fe submits metadata to iceberg in IcebergTransaction ### Future work - Add unit test for partition transform function. - Implement partition transform function with exchange sink turned on. - The partition transform function omits the processing of bigint type. --------- Co-authored-by: lik40 <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
morningman
pushed a commit
that referenced
this issue
Jun 28, 2024
## Proposed changes Issue Number: #31442 1. support hive3 'truncate table' 2. forbiden hive2 'truncate table' because of it is not supported by HMS Client API on hive2
dataroaring
pushed a commit
that referenced
this issue
Jun 30, 2024
## Proposed changes Issue Number: #31442 1. support hive3 'truncate table' 2. forbiden hive2 'truncate table' because of it is not supported by HMS Client API on hive2
kaka11chen
pushed a commit
to kaka11chen/doris
that referenced
this issue
Jul 12, 2024
…apache#36289) apache#31442 Added iceberg operator function to support direct entry into the lake by doris 1. Support insert into data to iceberg by appending hdfs files 2. Implement iceberg partition routing through partitionTransform 2.1) Serialize spec and schema data into json on the fe side and then deserialize on the be side to get the schema and partition information of iceberg table 2.2) Then implement Iceberg's Identity, Bucket, Year/Month/Day and other types of partition strategies through partitionTransform and template class 3. Transaction management through IcebergTransaction 3.1) After the be side file is written, report CommitData data to fe according to the partition granularity 3.2) After receiving CommitData data, fe submits metadata to iceberg in IcebergTransaction ### Future work - Add unit test for partition transform function. - Implement partition transform function with exchange sink turned on. - The partition transform function omits the processing of bigint type. --------- Co-authored-by: lik40 <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Search before asking
Description
Before 2.0, Doris can only support query data from external catalog such as Hive, Hudi, etc.
Now we would like to support DDL/DML feature on external catalog
Use case
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: