-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: upgrade storage crate to arrow and parquet offcial impl #738
feat: upgrade storage crate to arrow and parquet offcial impl #738
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that the change to the ParquetReader
is incorrect and may cause unexpected output.
a36f399
to
6541670
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* chore: kick off. change datafusion/arrow/parquet to target version Signed-off-by: Ruihang Xia <[email protected]> * chore: replace one last datafusion dep Signed-off-by: Ruihang Xia <[email protected]> * feat: arrow_array switch to arrow * chore: update dep of binary vector * chore: fix wrong merge commit Signed-off-by: Ruihang Xia <[email protected]> * feat: Switch to datatypes2 * feat: Make recordbatch compile * chore: sort Cargo.toml * feat: Fix common::recordbatch compiler errors * feat: Fix recordbatch test compiling issue * fix: api crate (#708) * fix: rename ConcreteDataType::timestamp_millis_type to ConcreteDataType::timestamp_millisecond_type. fix other warnings regarding timestamp * fix: revert changes in datatypes2 * fix: helper * chore: delete datatypes based on arrow2 * feat: Fix some compiler errors in common::query (#710) * feat: Fix some compiler errors in common::query * feat: test_collect use vectors api * fix: common-query subcrate (#712) * fix: record batch adapter Signed-off-by: Ruihang Xia <[email protected]> * fix error enum Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix common::query compiler errors (#713) * feat: Move conversion to ScalarValue to value.rs * fix: Fix common::query compiler errors This commit also make InnerError pub(crate) * feat: Implements diff accumulator using WrapperType (#715) * feat: Remove usage of opaque error from common::recordbatch * feat: Remove opaque error from common::query * feat: Fix diff compiler errors Now common_function just use common_query's Error and Result. Adds a LargestType associated type to LogicalPrimitiveType to get the largest type a logical primitive type can cast to. * feat: Remove LargestType from NativeType trait * chore: Update comments * feat: Restrict Scalar::RefType of WrapperType to itself Add trait bound `for<'a> Scalar<RefType<'a> = Self>` to WrapperType * chore: Address CR comments * chore: Format codes * fix: fix compile error for mean/polyval/pow/interp ops Signed-off-by: Ruihang Xia <[email protected]> * Revert "fix: fix compile error for mean/polyval/pow/interp ops" This reverts commit fb0b4eb. * fix: Fix compiler errors in argmax/rate/median/norm_cdf (#716) * fix: Fix compiler errors in argmax/rate/median/norm_cdf * chore: Address CR comments * fix: fix compile error for mean/polyval/pow/interp ops (#717) * fix: fix compile error for mean/polyval/pow/interp ops Signed-off-by: Ruihang Xia <[email protected]> * simplify type bounds Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf errors (#718) fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf compiler errors * fix: fix other compile error in common-function (#719) * further fixing Signed-off-by: Ruihang Xia <[email protected]> * fix all compile errors in common function Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix tests and clippy for common-function subcrate (#726) * further fixing Signed-off-by: Ruihang Xia <[email protected]> * fix all compile errors in common function Signed-off-by: Ruihang Xia <[email protected]> * fix tests Signed-off-by: Ruihang Xia <[email protected]> * fix clippy Signed-off-by: Ruihang Xia <[email protected]> * revert test changes Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: row group pruning (#725) * fix: row group pruning * chore: use macro to simplify stats implemetation * fxi: CR comments * fix: row group metadata length mismatch * fix: simplify code * fix: Fix common::grpc compiler errors (#722) * fix: Fix common::grpc compiler errors This commit refactors RecordBatch and holds vectors in the RecordBatch struct, so we don't need to cast the array to vector when doing serialization or iterating the batch. Now we use the vector API instead of the arrow API in grpc crate. * chore: Address CR comments * fix common record batch Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix compile error in server subcrate (#727) * fix: Fix compile error in server subcrate Signed-off-by: Ruihang Xia <[email protected]> * remove unused type alias Signed-off-by: Ruihang Xia <[email protected]> * explicitly panic Signed-off-by: Ruihang Xia <[email protected]> * Update src/storage/src/sst/parquet.rs Co-authored-by: Yingwen <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Yingwen <[email protected]> * fix: Fix common grpc expr (#730) * fix compile errors Signed-off-by: Ruihang Xia <[email protected]> * rename fn names Signed-off-by: Ruihang Xia <[email protected]> * fix styles Signed-off-by: Ruihang Xia <[email protected]> * fix wranings in common-time Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: pre-cast to avoid tremendous match arms (#734) Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * feat: upgrade storage crate to arrow and parquet offcial impl (#738) * fix: compile erros * fix: parquet reader and writer * fix: parquet reader and writer * fix: WriteBatch IPC encode/decode * fix: clippy errors in storage subcrate * chore: remove suspicious unwrap * fix: some cr comments * fix: CR comments * fix: CR comments * fix: Fix compiler errors in catalog and mito crates (#742) * fix: Fix compiler errors in mito * fix: Fix compiler errors in catalog crate * style: Fix clippy * chore: Fix use * Merge pull request #745 * fix nyc-taxi and util * Merge branch 'replace-arrow2' into fix-others * fix substrait * fix warnings and error in test * fix: Fix imports in optimizer.rs * fix: errors in optimzer * fix: remove unwrap * fix: Fix compiler errors in query crate (#746) * fix: Fix compiler errors in state.rs * fix: fix compiler errors in state * feat: upgrade sqlparser to 0.26 * fix: fix datafusion engine compiler errors * fix: Fix some tests in query crate * fix: Fix all warnings in tests * feat: Remove `Type` from timestamp's type name * fix: fix query tests Now datafusion already supports median, so this commit also remove the median function * style: Fix clippy * feat: Remove RecordBatch::pretty_print * chore: Address CR comments * Update src/query/src/query_engine/state.rs Co-authored-by: Ruihang Xia <[email protected]> * fix: frontend compile errors (#747) fix: fix compile errors in frontend * fix: Fix compiler errors in script crate (#749) * fix: Fix compiler errors in state.rs * fix: fix compiler errors in state * feat: upgrade sqlparser to 0.26 * fix: fix datafusion engine compiler errors * fix: Fix some tests in query crate * fix: Fix all warnings in tests * feat: Remove `Type` from timestamp's type name * fix: fix query tests Now datafusion already supports median, so this commit also remove the median function * style: Fix clippy * feat: Remove RecordBatch::pretty_print * chore: Address CR comments * feat: Add column_by_name to RecordBatch * feat: modify select_from_rb * feat: Fix some compiler errors in vector.rs * feat: Fix more compiler errors in vector.rs * fix: fix table.rs Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix compiler errors in coprocessor * fix: Fix some compiler errors * fix: Fix compiler errors in script * chore: Remove unused imports and format code * test: disable interval tests * test: Fix test_compile_execute test * style: Fix clippy * feat: Support interval * feat: Add RecordBatch::columns and fix clippy Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> * fix: Fix All The Tests! (#752) * fix: Fix several tests compile errors Signed-off-by: Ruihang Xia <[email protected]> * fix: some compile errors in tests Signed-off-by: Ruihang Xia <[email protected]> * fix: compile errors in frontend tests * fix: compile errors in frontend tests * test: Fix tests in api and common-query * test: Fix test in sql crate * fix: resolve substrait error Signed-off-by: Ruihang Xia <[email protected]> * chore: add more test * test: Fix tests in servers * fix instance_test Signed-off-by: Ruihang Xia <[email protected]> * test: Fix tests in tests-integration Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Lei, HUANG <[email protected]> Co-authored-by: evenyag <[email protected]> * fix: clippy errors Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> Co-authored-by: evenyag <[email protected]>
* chore: kick off. change datafusion/arrow/parquet to target version Signed-off-by: Ruihang Xia <[email protected]> * chore: replace one last datafusion dep Signed-off-by: Ruihang Xia <[email protected]> * feat: arrow_array switch to arrow * chore: update dep of binary vector * chore: fix wrong merge commit Signed-off-by: Ruihang Xia <[email protected]> * feat: Switch to datatypes2 * feat: Make recordbatch compile * chore: sort Cargo.toml * feat: Fix common::recordbatch compiler errors * feat: Fix recordbatch test compiling issue * fix: api crate (GreptimeTeam#708) * fix: rename ConcreteDataType::timestamp_millis_type to ConcreteDataType::timestamp_millisecond_type. fix other warnings regarding timestamp * fix: revert changes in datatypes2 * fix: helper * chore: delete datatypes based on arrow2 * feat: Fix some compiler errors in common::query (GreptimeTeam#710) * feat: Fix some compiler errors in common::query * feat: test_collect use vectors api * fix: common-query subcrate (GreptimeTeam#712) * fix: record batch adapter Signed-off-by: Ruihang Xia <[email protected]> * fix error enum Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix common::query compiler errors (GreptimeTeam#713) * feat: Move conversion to ScalarValue to value.rs * fix: Fix common::query compiler errors This commit also make InnerError pub(crate) * feat: Implements diff accumulator using WrapperType (GreptimeTeam#715) * feat: Remove usage of opaque error from common::recordbatch * feat: Remove opaque error from common::query * feat: Fix diff compiler errors Now common_function just use common_query's Error and Result. Adds a LargestType associated type to LogicalPrimitiveType to get the largest type a logical primitive type can cast to. * feat: Remove LargestType from NativeType trait * chore: Update comments * feat: Restrict Scalar::RefType of WrapperType to itself Add trait bound `for<'a> Scalar<RefType<'a> = Self>` to WrapperType * chore: Address CR comments * chore: Format codes * fix: fix compile error for mean/polyval/pow/interp ops Signed-off-by: Ruihang Xia <[email protected]> * Revert "fix: fix compile error for mean/polyval/pow/interp ops" This reverts commit fb0b4eb. * fix: Fix compiler errors in argmax/rate/median/norm_cdf (GreptimeTeam#716) * fix: Fix compiler errors in argmax/rate/median/norm_cdf * chore: Address CR comments * fix: fix compile error for mean/polyval/pow/interp ops (GreptimeTeam#717) * fix: fix compile error for mean/polyval/pow/interp ops Signed-off-by: Ruihang Xia <[email protected]> * simplify type bounds Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf errors (GreptimeTeam#718) fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf compiler errors * fix: fix other compile error in common-function (GreptimeTeam#719) * further fixing Signed-off-by: Ruihang Xia <[email protected]> * fix all compile errors in common function Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix tests and clippy for common-function subcrate (GreptimeTeam#726) * further fixing Signed-off-by: Ruihang Xia <[email protected]> * fix all compile errors in common function Signed-off-by: Ruihang Xia <[email protected]> * fix tests Signed-off-by: Ruihang Xia <[email protected]> * fix clippy Signed-off-by: Ruihang Xia <[email protected]> * revert test changes Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: row group pruning (GreptimeTeam#725) * fix: row group pruning * chore: use macro to simplify stats implemetation * fxi: CR comments * fix: row group metadata length mismatch * fix: simplify code * fix: Fix common::grpc compiler errors (GreptimeTeam#722) * fix: Fix common::grpc compiler errors This commit refactors RecordBatch and holds vectors in the RecordBatch struct, so we don't need to cast the array to vector when doing serialization or iterating the batch. Now we use the vector API instead of the arrow API in grpc crate. * chore: Address CR comments * fix common record batch Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix compile error in server subcrate (GreptimeTeam#727) * fix: Fix compile error in server subcrate Signed-off-by: Ruihang Xia <[email protected]> * remove unused type alias Signed-off-by: Ruihang Xia <[email protected]> * explicitly panic Signed-off-by: Ruihang Xia <[email protected]> * Update src/storage/src/sst/parquet.rs Co-authored-by: Yingwen <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Yingwen <[email protected]> * fix: Fix common grpc expr (GreptimeTeam#730) * fix compile errors Signed-off-by: Ruihang Xia <[email protected]> * rename fn names Signed-off-by: Ruihang Xia <[email protected]> * fix styles Signed-off-by: Ruihang Xia <[email protected]> * fix wranings in common-time Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * fix: pre-cast to avoid tremendous match arms (GreptimeTeam#734) Signed-off-by: Ruihang Xia <[email protected]> Signed-off-by: Ruihang Xia <[email protected]> * feat: upgrade storage crate to arrow and parquet offcial impl (GreptimeTeam#738) * fix: compile erros * fix: parquet reader and writer * fix: parquet reader and writer * fix: WriteBatch IPC encode/decode * fix: clippy errors in storage subcrate * chore: remove suspicious unwrap * fix: some cr comments * fix: CR comments * fix: CR comments * fix: Fix compiler errors in catalog and mito crates (GreptimeTeam#742) * fix: Fix compiler errors in mito * fix: Fix compiler errors in catalog crate * style: Fix clippy * chore: Fix use * Merge pull request GreptimeTeam#745 * fix nyc-taxi and util * Merge branch 'replace-arrow2' into fix-others * fix substrait * fix warnings and error in test * fix: Fix imports in optimizer.rs * fix: errors in optimzer * fix: remove unwrap * fix: Fix compiler errors in query crate (GreptimeTeam#746) * fix: Fix compiler errors in state.rs * fix: fix compiler errors in state * feat: upgrade sqlparser to 0.26 * fix: fix datafusion engine compiler errors * fix: Fix some tests in query crate * fix: Fix all warnings in tests * feat: Remove `Type` from timestamp's type name * fix: fix query tests Now datafusion already supports median, so this commit also remove the median function * style: Fix clippy * feat: Remove RecordBatch::pretty_print * chore: Address CR comments * Update src/query/src/query_engine/state.rs Co-authored-by: Ruihang Xia <[email protected]> * fix: frontend compile errors (GreptimeTeam#747) fix: fix compile errors in frontend * fix: Fix compiler errors in script crate (GreptimeTeam#749) * fix: Fix compiler errors in state.rs * fix: fix compiler errors in state * feat: upgrade sqlparser to 0.26 * fix: fix datafusion engine compiler errors * fix: Fix some tests in query crate * fix: Fix all warnings in tests * feat: Remove `Type` from timestamp's type name * fix: fix query tests Now datafusion already supports median, so this commit also remove the median function * style: Fix clippy * feat: Remove RecordBatch::pretty_print * chore: Address CR comments * feat: Add column_by_name to RecordBatch * feat: modify select_from_rb * feat: Fix some compiler errors in vector.rs * feat: Fix more compiler errors in vector.rs * fix: fix table.rs Signed-off-by: Ruihang Xia <[email protected]> * fix: Fix compiler errors in coprocessor * fix: Fix some compiler errors * fix: Fix compiler errors in script * chore: Remove unused imports and format code * test: disable interval tests * test: Fix test_compile_execute test * style: Fix clippy * feat: Support interval * feat: Add RecordBatch::columns and fix clippy Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> * fix: Fix All The Tests! (GreptimeTeam#752) * fix: Fix several tests compile errors Signed-off-by: Ruihang Xia <[email protected]> * fix: some compile errors in tests Signed-off-by: Ruihang Xia <[email protected]> * fix: compile errors in frontend tests * fix: compile errors in frontend tests * test: Fix tests in api and common-query * test: Fix test in sql crate * fix: resolve substrait error Signed-off-by: Ruihang Xia <[email protected]> * chore: add more test * test: Fix tests in servers * fix instance_test Signed-off-by: Ruihang Xia <[email protected]> * test: Fix tests in tests-integration Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Lei, HUANG <[email protected]> Co-authored-by: evenyag <[email protected]> * fix: clippy errors Signed-off-by: Ruihang Xia <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> Co-authored-by: evenyag <[email protected]>
I hereby agree to the terms of the GreptimeDB CLA
What's changed and what's your intention?
parquet
official cratearrow
official crateChecklist
Refer to a related PR or issue link (optional)
#555