Skip to content

Commit

Permalink
feat: add merge stream (apache#1595)
Browse files Browse the repository at this point in the history
## Rationale

Part of Metric Engine.

## Detailed Changes
- Scan SSTs in parallel based on segment
- Sort SST using SortPreservingMergeExec, which is more efficient than
SortExec
- Add MergeExec to dedup record batch based on sorted batches, currently
only `overwrite` semantics is supported.

## Test Plan

Add two new UT.
  • Loading branch information
jiacai2050 authored Nov 20, 2024
1 parent b5025d9 commit 1d7c549
Show file tree
Hide file tree
Showing 9 changed files with 740 additions and 153 deletions.
120 changes: 72 additions & 48 deletions horaedb/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion horaedb/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,19 @@ anyhow = { version = "1.0" }
metric_engine = { path = "metric_engine" }
thiserror = "1"
bytes = "1"
datafusion = "42"
datafusion = "43"
parquet = { version = "53" }
object_store = { version = "0.11" }
macros = { path = "../src/components/macros" }
pb_types = { path = "pb_types" }
prost = { version = "0.13" }
arrow = { version = "53", features = ["prettyprint"] }
arrow-schema = "53"
tokio = { version = "1", features = ["full"] }
async-trait = "0.1"
async-stream = "0.3"
futures = "0.3"
temp-dir = "0.1"
itertools = "0.3"
lazy_static = "1"
tracing = "0.1"
Expand Down
Loading

0 comments on commit 1d7c549

Please sign in to comment.