Releases: kwai/blaze
Releases · kwai/blaze
v4.0.0
New features
- supports spark3.0/3.1/3.2/3.3/3.4/3.5.
- supports integrating with Apache Celeborn.
- supports native ORC input format.
- supports bloom filter join introduced in spark 3.5.
- supports forceShuffledHashJoin for running tpch/tpcds benchmarks.
- new supported native expression/functions: year, month, day, md5.
Bug fixes
- add missing UDTF.terminate() invokes.
- fix NPE while executing some native spark physical plans.
Performance
- use custom implemented hash table for faster joining, supporting SIMD, bulk searching, memory prefetching, etc.
- improve shuffle write performance.
- reuse FSDataInputStream for same input file.
What's Changed
- spark351 + supports bloom filter join by @richox in #532
- Support Spark 3.2.0 by @XorSum in #538
- Bump sonic-rs from 0.3.8 to 0.3.9 by @dependabot in #533
- Bump tonic-build from 0.12.0 to 0.12.1 by @dependabot in #525
- Bump prost from 0.13.0 to 0.13.1 by @dependabot in #521
- supports UDTF.terminate by @richox in #536
- Bump bytes from 1.6.1 to 1.7.0 by @dependabot in #539
- Bump tokio from 1.38.0 to 1.39.2 by @dependabot in #534
- Bump tempfile from 3.10.1 to 3.12.0 by @dependabot in #542
- Bump bytes from 1.7.0 to 1.7.1 by @dependabot in #540
- supports forceShuffledHashJoin. by @richox in #545
- init RSS framework by @richox in #551
- Bump radsort from 0.1.0 to 0.1.1 by @dependabot in #550
- Bump postcard from 1.0.8 to 1.0.10 by @dependabot in #552
- Bump tonic-build from 0.12.1 to 0.12.2 by @dependabot in #554
- Bump sonic-rs from 0.3.9 to 0.3.12 by @dependabot in #555
- improve shuffle performance by @richox in #559
- Bump prost from 0.13.1 to 0.13.2 by @dependabot in #556
- Bump async-trait from 0.1.81 to 0.1.82 by @dependabot in #557
- Bump tokio from 1.39.2 to 1.40.0 by @dependabot in #558
- delete walkaround_sliced_boolean_array_issue by @DDDominik in #560
- fix CI by @richox in #562
- optimize hash joins by @richox in #563
- support year/month/day functions by @richox in #565
- Support Spark 3.1.3 by @harveyyue in #561
- Bump sonic-rs from 0.3.12 to 0.3.13 by @dependabot in #566
- WIP: improve shuffle write performance by @richox in #564
- remove variables from artifacts in pom.xml by @richox in #569
- add --allow-no-vcs to cargo fix command by @richox in #570
- use memory prefetch in hash map building by @richox in #571
- support native scan orc format by @harveyyue in #544
- add spark prefix to tpcds.gen.defaultParallel to take affect by @TJX2014 in #572
- support md5 command by @KnightChess in #581
- Bump bytes from 1.7.1 to 1.7.2 by @dependabot in #578
- update doc to add protobuf installation guides by @richox in #583
- Bump prost from 0.13.2 to 0.13.3 by @dependabot in #586
- make RUSTFLAGS configurable by @XorSum in #585
- [BLAZE-587] Replace spark pattern with spark- for maven profile and shim name by @SteNicholas in #588
- use hadoop fs positioned-reading by @richox in #582
- [BLAZE-573] Support Spark 3.4 version by @SteNicholas in #589
- docs/fix: fix error message by @caicancai in #591
- reuse FSDataInputStream for same input file by @richox in #592
- make RUSTFLAGS effective before running cargo commands by @harveyyue in #594
- [BLAZE-287] Integrate Blaze with Celeborn by @RexXiong in #596
- fix spark3.1 compile error by @richox in #598
- fix timer in AggExec by @richox in #597
- update docs (celeborn related) by @richox in #599
- Change the output qualifier from var to def will avoid invoking NPE by @harveyyue in #608
- OrcExec should read the specified file range data by @harveyyue in #605
- Bump tempfile from 3.12.0 to 3.13.0 by @dependabot in #604
- Bump once_cell from 1.20.0 to 1.20.1 by @dependabot in #603
- Bump tonic-build from 0.12.2 to 0.12.3 by @dependabot in #595
- Bump async-trait from 0.1.82 to 0.1.83 by @dependabot in #590
- Bump futures-util from 0.3.30 to 0.3.31 by @dependabot in #609
- Bump futures from 0.3.30 to 0.3.31 by @dependabot in #610
- Bump once_cell from 1.20.1 to 1.20.2 by @dependabot in #611
New Contributors
- @XorSum made their first contribution in #538
- @DDDominik made their first contribution in #560
- @TJX2014 made their first contribution in #572
- @KnightChess made their first contribution in #581
- @SteNicholas made their first contribution in #588
- @caicancai made their first contribution in #591
- @RexXiong made their first contribution in #596
Full Changelog: v3.0.1...v4.0.0
v3.0.1
v3.0.0 [yanked]
blaze-v3.0.0 [yanked]
Features
- Supports using spark.io.compression.codec for shuffle/broadcast compression
- Supports date type casting
- Refactor join implementations to support existence joins and BHJ building hash map on driver side
Performance
- Fixed performance issues when running on spark3 with default configurations
- Use cached parquet metadata
- Refactor native broadcast to avoid duplicated broadcast jobs
- Supports spark333 batch shuffle reading
Bugfix
- Fix in_list conversion in from_proto.rs
v2.0.9.1
release version 2.0.9.1 (#470) Co-authored-by: zhangli20 <[email protected]>
v2.0.9
v2.0.8
v2.0.7
update blaze version 2.0.7-SNAPSHOT (#312) Co-authored-by: zhangli20 <[email protected]>