support custom partitioner for nebula when generate sst files #49

Nicole00 · 2022-01-05T03:25:37Z

whether to use custom partitioner is configurable.
use custom partitioner make sure the keys in different sst files does not overlap.
When ingest sst files generated with custom partitioner, all most sst files lies on L6 (space is empty before ingest).

add config for each tag or edge in the config file:
repartitionWithNebula:false/true ， default is false.

Nicole00 · 2022-01-05T03:28:56Z

close #46

codecov-commenter · 2022-01-05T06:57:03Z

Codecov Report

Merging #49 (ebec49c) into master (d31546b) will increase coverage by 4.40%.
The diff coverage is 69.44%.

@@             Coverage Diff              @@
##             master      #49      +/-   ##
============================================
+ Coverage     50.19%   54.60%   +4.40%     
- Complexity       74       76       +2     
============================================
  Files            16       17       +1     
  Lines          1291     1315      +24     
  Branches        246      249       +3     
============================================
+ Hits            648      718      +70     
+ Misses          525      472      -53     
- Partials        118      125       +7

Impacted Files	Coverage Δ
...soft/exchange/common/utils/NebulaPartitioner.scala	`11.11% <11.11%> (ø)`
...om/vesoft/exchange/common/config/SinkConfigs.scala	`73.33% <66.66%> (-3.59%)`	⬇️
...m/vesoft/exchange/common/processor/Processor.scala	`67.42% <71.42%> (+0.22%)`	⬆️
...la/com/vesoft/exchange/common/config/Configs.scala	`66.15% <100.00%> (+0.45%)`	⬆️
.../vesoft/exchange/common/config/SchemaConfigs.scala	`71.87% <100.00%> (+0.90%)`	⬆️
...vesoft/exchange/common/writer/FileBaseWriter.scala	`85.71% <100.00%> (+85.71%)`	⬆️
...a/com/vesoft/exchange/common/utils/HDFSUtils.scala	`27.58% <0.00%> (+27.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d31546b...ebec49c. Read the comment docs.

critical27

Well done

darionyaphet · 2022-01-05T08:48:12Z

exchange-common/src/main/scala/com/vesoft/exchange/common/processor/Processor.scala

+                        data: Dataset[(Array[Byte], Array[Byte])],
+                        partitionNum: Int): Dataset[(Array[Byte], Array[Byte])] = {
+    import spark.implicits._
+    data.rdd


why don't use repartition directly?

why don't use repartition directly?

Dataframe doesn't have customed repartition function, it's RDD's function.

Nicole00 added 2 commits January 4, 2022 19:40

support custom partitioner for nebula when generate sst files

512e003

support custom partitioner for nebula when generate sst files

7809fcf

Nicole00 added 2 commits January 5, 2022 15:01

exclude jackson-core

4012976

add test

ebec49c

Nicole00 force-pushed the partitioner branch from b48a117 to ebec49c Compare January 5, 2022 07:22

critical27 approved these changes Jan 5, 2022

View reviewed changes

critical27 merged commit 78fb290 into vesoft-inc:master Jan 5, 2022

Nicole00 added the doc affected PR: improvements or additions to documentation label Jan 5, 2022

darionyaphet reviewed Jan 5, 2022

View reviewed changes

jamieliu1023 mentioned this pull request Jan 8, 2022

Weekly Report 2022-01-07 vesoft-inc/nebula-community#84

Closed

wey-gu mentioned this pull request Mar 8, 2022

Whether to forcibly enable repartitioning when the number of nebula space partitions is greater than 1 #71

Closed

wey-gu mentioned this pull request May 7, 2022

highlight known issue on exchange sst vesoft-inc/nebula-docs-cn#1776

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support custom partitioner for nebula when generate sst files #49

support custom partitioner for nebula when generate sst files #49

Nicole00 commented Jan 5, 2022 •

edited

Loading

Nicole00 commented Jan 5, 2022

codecov-commenter commented Jan 5, 2022 •

edited

Loading

critical27 left a comment

darionyaphet Jan 5, 2022

Nicole00 Jan 18, 2022 •

edited

Loading

support custom partitioner for nebula when generate sst files #49

support custom partitioner for nebula when generate sst files #49

Conversation

Nicole00 commented Jan 5, 2022 • edited Loading

Nicole00 commented Jan 5, 2022

codecov-commenter commented Jan 5, 2022 • edited Loading

Codecov Report

critical27 left a comment

Choose a reason for hiding this comment

darionyaphet Jan 5, 2022

Choose a reason for hiding this comment

Nicole00 Jan 18, 2022 • edited Loading

Choose a reason for hiding this comment

Nicole00 commented Jan 5, 2022 •

edited

Loading

codecov-commenter commented Jan 5, 2022 •

edited

Loading

Nicole00 Jan 18, 2022 •

edited

Loading