Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): add date, time, interval, bytea and jsonb data types for PG/MySQL sink of stream_chunk/json payload #9957

Merged
merged 31 commits into from
May 30, 2023

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented May 23, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • Add date, time, interval, bytea and jsonb data types for postgres/mysql sink in stream_chunk/json payload
  • Since interval and jsonb are not standard jdbc types, we use PGInterval and PGObject from postgres driver to fulfill the prepare statement
  • An e2e sink test was added for both mysql and postgres, for types date, time, timestamp, jsonb and bytea

I tested it with the jdbc sink in the local environment.

part of #9918, #9919

Checklist For Contributors

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
  • All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Release note

@StrikeW StrikeW force-pushed the siyuan/jdbc-sink-data-types branch from 77dfa61 to aa2977b Compare May 23, 2023 07:49
@codecov
Copy link

codecov bot commented May 23, 2023

Codecov Report

Merging #9957 (2035ae8) into main (2b04ed5) will decrease coverage by 0.06%.
The diff coverage is 21.00%.

@@            Coverage Diff             @@
##             main    #9957      +/-   ##
==========================================
- Coverage   71.03%   70.98%   -0.06%     
==========================================
  Files        1231     1231              
  Lines      210684   210883     +199     
==========================================
+ Hits       149655   149690      +35     
- Misses      61029    61193     +164     
Flag Coverage Δ
rust 70.98% <21.00%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/java_binding/src/lib.rs 0.00% <0.00%> (ø)
src/connector/src/sink/kafka.rs 38.42% <14.28%> (-0.94%) ⬇️
src/connector/src/sink/remote.rs 57.07% <83.33%> (+0.43%) ⬆️
src/connector/src/sink/mod.rs 73.28% <85.71%> (+1.96%) ⬆️

... and 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@StrikeW StrikeW changed the title feat(sink): add date, time data types for streamchunk payload feat(sink): add date, time, interval and jsonb data types for streamchunk payload May 23, 2023
@StrikeW StrikeW changed the title feat(sink): add date, time, interval and jsonb data types for streamchunk payload feat(sink): add date, time, interval and jsonb data types for PG sink of streamchunk payload May 23, 2023
@StrikeW StrikeW changed the title feat(sink): add date, time, interval and jsonb data types for PG sink of streamchunk payload feat(sink): add date, time, interval and jsonb data types for PG sink of stream_chunk payload May 23, 2023
@neverchanje
Copy link
Contributor

Good work @StrikeW ! But I want to know if pginterval and pgobject is only used for pgsink. What is the behavior of sinking intervals and jsonbs to a mysql?

@StrikeW
Copy link
Contributor Author

StrikeW commented May 24, 2023

Good work @StrikeW ! But I want to know if pginterval and pgobject is only used for pgsink. What is the behavior of sinking intervals and jsonbs to a mysql?

You are right. We should check whether the downstream is PG since those classes are specific to the PG driver.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

let string_value = env.new_string(value)?;
let (class_ref, constructor) =
pointer.as_ref().class_cache.date_ctor.get_or_try_init(|| {
let cls = env.find_class("java/sql/Date")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is possible to create a Data and Time object from a integer timestamp instead of from string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're worried about the performance issue. The string representation is non-ambiguous while the integer representation may encounter precision issues, for example ms vs us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for Date and Time, their minimal required precision is millisecond, and therefore there is no need to care about the nano second level precision. And I think it should be safe to use the constructor from a timestamp of long type. It takes more time to convert to a string first and then parse the string.

cc @WillyKidd

https://docs.oracle.com/javase/8/docs/api/java/sql/Time.html#Time-long-
https://docs.oracle.com/javase/8/docs/api/java/sql/Date.html#Date-long-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can do this in the next pr :)

@StrikeW StrikeW marked this pull request as draft May 24, 2023 10:54
@WillyKidd WillyKidd force-pushed the siyuan/jdbc-sink-data-types branch from 0edb940 to 7247041 Compare May 26, 2023 03:56
@WillyKidd WillyKidd changed the title feat(sink): add date, time, interval and jsonb data types for PG sink of stream_chunk payload feat(sink): add date, time, interval, bytea and jsonb data types for PG sink of stream_chunk payload May 26, 2023
@WillyKidd WillyKidd marked this pull request as ready for review May 26, 2023 07:04
@WillyKidd WillyKidd marked this pull request as draft May 29, 2023 09:08
@WillyKidd WillyKidd marked this pull request as ready for review May 29, 2023 09:54
let value = datum_to_json_object(
&inner_field,
sub_datum_ref,
TimestampHandlingMode::String,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you don't use the passing argument timestamp_handling_mode for list and struct? @WillyKidd

@StrikeW StrikeW added this pull request to the merge queue May 30, 2023
Merged via the queue into main with commit 5ea2204 May 30, 2023
@StrikeW StrikeW deleted the siyuan/jdbc-sink-data-types branch May 30, 2023 15:42
lmatz pushed a commit that referenced this pull request Jun 1, 2023
…PG/MySQL sink of stream_chunk/json payload (#9957)

Co-authored-by: weili <[email protected]>
Co-authored-by: WillyKidd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants