-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add end of pipe clean up hook to Sinks #750
Conversation
Thanks, @z3z1ma ! One thing I realized afterwards is that the relationship of streams and sinks is not always obvious - for instance a new sink may be created if a schema declaration is modified after another sink for that stream is already in flight. So, the Do you think that is okay in this context? I wonder if a uuid per sink, or a dedicated temp dir would help reduce chance of sinks accidentally stepping on each other during clean_up. (Not saying we need to add those to this PR, but just wanted to raise this for discussion.) What do you think? |
I think a sink can store arbitrary sink specific data structures and objects in a subclass of Sink since we all inherit it when using the SDK. So the user implemented clean up can inherently be sink specific. Still it's a good call out. A sink uuid prop on init might be universally ok. But not the make or break since it's easy in user land too. |
So if each sink writes to a specific bucket, file, or whatevs based on input stream name and we want to clean that, they won't ever step on each others feet. That sink itself would've explicitly implemented a method to resolve the bucket path and that same path would be used in the clean up tied to that specific sink. It would be developer best-practice to do a clean up that is NOT caustic to concurrent sinks (ie clearing a generic staging area instead of stream specific). This would imply a uuid or some uniqueness on sink instantiation. 🤔 While it feels more like developer best practices or I guess common sense, I am game for reducing their cognitive load. A sink specific uuid would remind them oh I should make sure my sink is safe for concurrency. That's my 2 cents. |
Agree 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@z3z1ma I left a comment re not making clean_up
an abstract method.
PS: can you address the flake8 errors? https://results.pre-commit.ci/run/github/379008980/1656003341.A3LTOHxoRfK1mp3bSNBLSw
Codecov Report
@@ Coverage Diff @@
## main #750 +/- ##
==========================================
+ Coverage 85.32% 85.35% +0.03%
==========================================
Files 34 34
Lines 3400 3408 +8
==========================================
+ Hits 2901 2909 +8
Misses 499 499
Continue to review full report at Codecov.
|
Double-agreed 👍 Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* add clean up hook to sinks called at end of pipe drain * make message more clear for devs and add decorator * unwrap the comprehension so mypy doesnt get mad * remove abstract method * fix flake8 errors Co-authored-by: Edgar R. M <[email protected]>
This adds a hook to the drain all method which is flagged at endofpipe drain. I can go on to list the ways I think this is useful. From ensuring futures are completed, to tearing down resources, clearing buckets or stages used in a load, to cleanup in general.