snowpipe: exactly once semantics #3060

rockwotj · 2024-12-05T02:53:22Z

Support 2 new properties in snowflake_streaming:

offset_token: A new property to support exactly once delivery: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#offset-tokens
channel_name: The ability to explicitly assign a batch to a channel. The current channel_prefix option doesn't support explicitly picking a channel, this allows exactly once from Kafka.

this is what is required for exactly once. We don't yet use it.

NOTE that since we can't ensure certain messages can go to a specific channel at the moment, this only really works with max_in_flight=1, which is probably fine for postgres, but another commit will support channel_name properly, so one can specify explicitly the mapping from data to channel.

This will help to re-use all this logic when we create the new output that specifies channel names explicitly.

To a seperate function so it can be used between different outputs.

To clarify it, instead of spreading it out all over, this also means the schema migration function can now be a free function

One that is responsible for coordination of schema evolution and other small pieces (like custom mappings). The purpose of this is to allow for another kind of inner output that can allow for a user to specifically set the channel name (instead of using a pool).

I'm not sure if this is 100% correct, but it will work for most cases.

See the examples on what this enables with a Redpanda/Kafka input (but not kafka_franz!).

This seems a bit clearer and has nice duality with the indexed pool

By holding a lock when doing this during WriteBatch, and not having the framework call Connect outside of pipeline creation, just handle it internally.

rockwotj force-pushed the snow-once branch from 73aed3a to 2116cc3 Compare December 5, 2024 11:45

snowflake: add a test about reopening a channel invalidates old channels

e108b45

rockwotj force-pushed the snow-once branch 3 times, most recently from e5423ce to 10a350f Compare December 9, 2024 21:38

rockwotj marked this pull request as ready for review December 10, 2024 02:53

rockwotj requested review from mihaitodor, ooesili and Jeffail December 10, 2024 02:53

rockwotj added 21 commits December 10, 2024 20:58

snowpipe: add lint rules for build options

24f8371

snowpipe: rename capped package to pool

e951c5c

snowpipe: add indexed pool utility

5123cf9

snowpipe: plumb offset_token around in snowpipe streaming api

445c519

this is what is required for exactly once. We don't yet use it.

snowpipe: extract out schema evolution to seperate struct

773e143

This will help to re-use all this logic when we create the new output that specifies channel names explicitly.

snowpipe: fix linter errors

3182ab8

snowpipe: update docs

2f535bc

snowpipe: cleanup integration test

0c96ac6

snowpipe: fix pool tests

50db3e4

snowpipe: extract exactly once processing

77cddce

To a seperate function so it can be used between different outputs.

snowpipe: move schema migration locking to one place

6d996d5

To clarify it, instead of spreading it out all over, this also means the schema migration function can now be a free function

snowpipe: extract metrics to another file

7b0c564

snowpipe: do some basic normalization for table names

dc12692

I'm not sure if this is 100% correct, but it will work for most cases.

snowpipe: support explicitly specifying the channel name

5ab9aca

See the examples on what this enables with a Redpanda/Kafka input (but not kafka_franz!).

snowflake: move id generation into pool

83a47d0

This seems a bit clearer and has nice duality with the indexed pool

chore: fmt

d2482e2

snowpipe: prevent races with Close/Connect

94c3c0e

By holding a lock when doing this during WriteBatch, and not having the framework call Connect outside of pipeline creation, just handle it internally.

snowpipe: fix cleanup

06bdb56

snowflake: clarify docs

9034bdd

snowflake: add some totally wicked examples

b6eec24

rockwotj force-pushed the snow-once branch from b87d578 to b6eec24 Compare December 10, 2024 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snowpipe: exactly once semantics #3060

snowpipe: exactly once semantics #3060

rockwotj commented Dec 5, 2024 •

edited

Loading

snowpipe: exactly once semantics #3060

Are you sure you want to change the base?

snowpipe: exactly once semantics #3060

Conversation

rockwotj commented Dec 5, 2024 • edited Loading

rockwotj commented Dec 5, 2024 •

edited

Loading