Support all insert/update/delete operations for dynamic table sink #81

linhr · 2022-12-05T10:52:05Z

What type of PR is this?

bug
feature
enhancement

What problem(s) does this PR solve?

Issue(s) number: #77

Description:

This code change enables the dynamic table sink to process upstream change log and performs vertex/edge insert/update/delete operations for each individual row.

How do you solve it?

The solution buffers the rows (and deduplicates them by primary keys) and delegates the execution to three executors based on the row kind (for insert, update, and delete, respectively) when committing the batch.

The primary key for vertices is the vertex ID, and the primary key for edges is the combination of source vertex ID, destination vertex ID, and the rank.

New and existing test cases can show that the solution is working.

Special notes for your reviewer, ex. impact of this fix, design document, etc:

To improve code readability, there are some minor (but backward-incompatible) changes to the public interface. The changes can be seen from the diff in README.md and the Java code in the example directory. Specifically, the following has changed:

The .builder() method of various execution option builder classes have been corrected to .build(), and old .builder() remains for compatibility.
The design of generic typing for some classes, along with some method signatures, have changed (e.g. NebulaBatchOutputFormat and NebulaSinkFunction).
The "batch" option has been renamed to "batch size" in the DataStream API, and old batch remains for compatibility.

codecov-commenter · 2022-12-10T04:05:44Z

Codecov Report

Base: 61.59% // Head: 65.18% // Increases project coverage by +3.59% 🎉

Coverage data is based on head (e3541c2) compared to base (29a0db2).
Patch coverage: 90.50% of modified lines in pull request are covered.

Additional details and impacted files

@@             Coverage Diff              @@
##             master      #81      +/-   ##
============================================
+ Coverage     61.59%   65.18%   +3.59%     
- Complexity      291      308      +17     
============================================
  Files            52       53       +1     
  Lines          1786     1873      +87     
  Branches        166      167       +1     
============================================
+ Hits           1100     1221     +121     
+ Misses          596      566      -30     
+ Partials         90       86       -4

Impacted Files	Coverage Δ
...ebula/connection/NebulaMetaConnectionProvider.java	`59.25% <0.00%> (+3.70%)`	⬆️
...ector/nebula/sink/NebulaEdgeBatchOutputFormat.java	`0.00% <0.00%> (ø)`
...e.flink/connector/nebula/utils/NebulaConstant.java	`95.00% <ø> (ø)`
...e.flink/connector/nebula/utils/PartitionUtils.java	`85.71% <ø> (ø)`
...connector/nebula/table/NebulaDynamicTableSink.java	`84.84% <71.42%> (-5.16%)`	⬇️
...connector/nebula/sink/NebulaBatchOutputFormat.java	`47.14% <83.33%> (-2.27%)`	⬇️
...link/connector/nebula/sink/NebulaSinkFunction.java	`66.66% <83.33%> (ø)`
.../nebula/sink/NebulaTableBufferReducedExecutor.java	`90.00% <90.00%> (ø)`
...ector/nebula/statement/VertexExecutionOptions.java	`92.53% <95.00%> (+25.87%)`	⬆️
...nnector/nebula/statement/EdgeExecutionOptions.java	`93.75% <95.45%> (+26.00%)`	⬆️
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Nicole00 · 2022-12-26T10:01:45Z

README.md

                .setGraphSpace("flinkSink")
                .setTag("player")
                .setIdIndex(0)
                .setFields(Arrays.asList("name", "age"))
                .setPositions(Arrays.asList(1, 2))
-                .setBatch(2)
-                .builder();
+                .setBatchSize(2)


We suggest not change the user interfaces in current version, may update them in next major version.
vesoft-inc/nebula-java#486 (comment)

Thanks for the suggestion! Yeah I think it's a good idea to ensure compatibility of user-facing interfaces within the same major version.

I've added back the methods such as .setBatch() and .builder() and marked them as @Deprecated, while in README.md I use the new methods. Hopefully this can encourage users to move to the new methods, without breaking existing code. Does this approach look fine to you? @Nicole00

Hopefully this can encourage

Great approach, thanks for changing it.

Nicole00 · 2022-12-26T10:02:23Z

...src/main/java/org.apache.flink/connector/nebula/connection/NebulaMetaConnectionProvider.java

@@ -81,7 +81,7 @@ public VidTypeEnum getVidType(MetaClient metaClient, String space) {
            spaceItem = metaClient.getSpace(space);
        } catch (TException | ExecuteFailedException e) {
            LOG.error("get space info error, ", e);
-            return null;
+            throw new RuntimeException(e);


good change, thanks~

Nicole00 · 2022-12-26T10:06:47Z

...r/src/main/java/org.apache.flink/connector/nebula/sink/NebulaEdgeBatchTableOutputFormat.java

+        EdgeExecutionOptions deleteOptions = executionOptions.toBuilder()
+                .setWriteMode(WriteModeEnum.DELETE)
+                .build();
+        return new NebulaTableBufferReducedExecutor(dataStructureConverter,


the insert mode is not upsert， and there is no update mode？

maybe we should change the upsert variable to insert, to avoid conflicts with nebula's upsert

Yes, we use the Nebula INSERT statements for both insert and update events in the data stream. What I found is that NebulaGraph will override existing vertices/edges during insert, so it works for the update event as well. (UPSERT can be expensive in NebulaGraph so I tried to avoid it.)

maybe we should change the upsert variable to insert

Sounds good! I've renamed the variables.

Nicole00 · 2022-12-27T02:44:53Z

It's an excellent pr, thanks very mach for your contribution @linhr

Nicole00 · 2022-12-27T03:15:08Z

connector/src/main/java/org.apache.flink/connector/nebula/table/NebulaDynamicTableFactory.java

+    public static final ConfigOption<Integer> ID_INDEX = ConfigOptions
+            .key("id-index")
+            .intType()
+            .defaultValue(0)


it's better and more uniform to define a default value in NebulaConstant

Good suggestion. Fixed. Thanks!

Nicole00 · 2022-12-27T06:44:31Z

connector/pom.xml

+            <version>${flink.version}</version>
+            <type>test-jar</type>
+            <scope>test</scope>
+        </dependency>


Repeated dependency？

I added this test-jar as a test dependency so that we can use the 'values' table connector when writing Flink integration tests. This connector allows us to build a table from test data in the code. I've seen this used in the official JDBC connector as well.

I see it, got a new approach for testing.

linhr · 2023-01-09T09:01:42Z

@Nicole00 Thanks for your review! I've made some changes to the code according to your feedback. Let me know if they look good.

linhr · 2023-02-10T09:57:16Z

@Nicole00 Do you think if this PR can be made into the 3.4.0 release? Let me know what you think about my changes after your initial review. Thanks a lot!

linhr added 3 commits December 5, 2022 18:05

Support configuring batching options

2c7b4fa

Support insert/update/delete for dynamic table sink

8cb77c3

Add tests

cf9db14

linhr mentioned this pull request Dec 5, 2022

Full insert/update/delete support for dynamic table sinks #77

Closed

linhr added 2 commits December 5, 2022 19:17

Fix time zone issue

faeb8eb

Fix 'UPDATE_BEFORE' issue

e3541c2

Nicole00 reviewed Dec 27, 2022

View reviewed changes

Address PR comments

351af36

linhr requested a review from Nicole00 January 9, 2023 09:08

Nicole00 approved these changes Feb 15, 2023

View reviewed changes

Nicole00 merged commit 91e8132 into vesoft-inc:master Feb 15, 2023

wey-gu mentioned this pull request Feb 18, 2023

Weekly Report 2023-02-17 vesoft-inc/nebula-community#326

Closed

linhr deleted the dynamic-table-sink-full-support branch February 23, 2023 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support all insert/update/delete operations for dynamic table sink #81

Support all insert/update/delete operations for dynamic table sink #81

linhr commented Dec 5, 2022 •

edited by Nicole00

Loading

codecov-commenter commented Dec 10, 2022

Nicole00 Dec 26, 2022

linhr Jan 9, 2023

Nicole00 Feb 15, 2023

Nicole00 Dec 26, 2022

Nicole00 Dec 26, 2022

Nicole00 Dec 27, 2022 •

edited

Loading

linhr Jan 9, 2023

Nicole00 commented Dec 27, 2022

Nicole00 Dec 27, 2022

linhr Jan 9, 2023

Nicole00 Dec 27, 2022

linhr Jan 9, 2023

Nicole00 Feb 15, 2023

linhr commented Jan 9, 2023

linhr commented Feb 10, 2023

Support all insert/update/delete operations for dynamic table sink #81

Support all insert/update/delete operations for dynamic table sink #81

Conversation

linhr commented Dec 5, 2022 • edited by Nicole00 Loading

What type of PR is this?

What problem(s) does this PR solve?

Issue(s) number: #77

Description:

How do you solve it?

Special notes for your reviewer, ex. impact of this fix, design document, etc:

codecov-commenter commented Dec 10, 2022

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nicole00 Dec 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Nicole00 commented Dec 27, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linhr commented Jan 9, 2023

linhr commented Feb 10, 2023

linhr commented Dec 5, 2022 •

edited by Nicole00

Loading

Nicole00 Dec 27, 2022 •

edited

Loading