Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add end of pipe clean up hook to Sinks #750

Merged
merged 8 commits into from
Jun 24, 2022

Conversation

z3z1ma
Copy link
Contributor

@z3z1ma z3z1ma commented Jun 22, 2022

This adds a hook to the drain all method which is flagged at endofpipe drain. I can go on to list the ways I think this is useful. From ensuring futures are completed, to tearing down resources, clearing buckets or stages used in a load, to cleanup in general.

@aaronsteers
Copy link
Contributor

aaronsteers commented Jun 22, 2022

Thanks, @z3z1ma !

One thing I realized afterwards is that the relationship of streams and sinks is not always obvious - for instance a new sink may be created if a schema declaration is modified after another sink for that stream is already in flight.

So, the clean_up() operation should just be sure to not modify or clean up the other sinks' resources, even for sinks that might share the same stream name.

Do you think that is okay in this context? I wonder if a uuid per sink, or a dedicated temp dir would help reduce chance of sinks accidentally stepping on each other during clean_up. (Not saying we need to add those to this PR, but just wanted to raise this for discussion.)

What do you think?

cc @edgarrmondragon

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jun 22, 2022

I think a sink can store arbitrary sink specific data structures and objects in a subclass of Sink since we all inherit it when using the SDK. So the user implemented clean up can inherently be sink specific. Still it's a good call out. A sink uuid prop on init might be universally ok. But not the make or break since it's easy in user land too.

@z3z1ma
Copy link
Contributor Author

z3z1ma commented Jun 22, 2022

So if each sink writes to a specific bucket, file, or whatevs based on input stream name and we want to clean that, they won't ever step on each others feet. That sink itself would've explicitly implemented a method to resolve the bucket path and that same path would be used in the clean up tied to that specific sink. It would be developer best-practice to do a clean up that is NOT caustic to concurrent sinks (ie clearing a generic staging area instead of stream specific). This would imply a uuid or some uniqueness on sink instantiation. 🤔 While it feels more like developer best practices or I guess common sense, I am game for reducing their cognitive load. A sink specific uuid would remind them oh I should make sure my sink is safe for concurrency.

That's my 2 cents.

@edgarrmondragon edgarrmondragon changed the title [feat] Add end of pipe clean up hook to Sinks feat: Add end of pipe clean up hook to Sinks Jun 23, 2022
@edgarrmondragon
Copy link
Collaborator

I think a sink can store arbitrary sink specific data structures and objects in a subclass of Sink since we all inherit it when using the SDK. So the user implemented clean up can inherently be sink specific. Still it's a good call out.

Agree 👍

Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@z3z1ma I left a comment re not making clean_up an abstract method.

PS: can you address the flake8 errors? https://results.pre-commit.ci/run/github/379008980/1656003341.A3LTOHxoRfK1mp3bSNBLSw

singer_sdk/sinks/core.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 23, 2022

Codecov Report

Merging #750 (442ccc7) into main (67455a1) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #750      +/-   ##
==========================================
+ Coverage   85.32%   85.35%   +0.03%     
==========================================
  Files          34       34              
  Lines        3400     3408       +8     
==========================================
+ Hits         2901     2909       +8     
  Misses        499      499              
Impacted Files Coverage Δ
singer_sdk/sinks/core.py 81.06% <100.00%> (+0.29%) ⬆️
singer_sdk/target_base.py 91.30% <100.00%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 67455a1...442ccc7. Read the comment docs.

@aaronsteers
Copy link
Contributor

I think a sink can store arbitrary sink specific data structures and objects in a subclass of Sink since we all inherit it when using the SDK. So the user implemented clean up can inherently be sink specific. Still it's a good call out.

Agree 👍

Double-agreed 👍

Thanks.

@z3z1ma z3z1ma requested a review from edgarrmondragon June 24, 2022 15:01
Copy link
Collaborator

@edgarrmondragon edgarrmondragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All checks 🟢, docs look good. Merging!

Thank you so much for your work, @z3z1ma!

@edgarrmondragon edgarrmondragon merged commit 27df0eb into meltano:main Jun 24, 2022
edgarrmondragon added a commit that referenced this pull request Jun 30, 2022
* add clean up hook to sinks called at end of pipe drain

* make message more clear for devs and add decorator

* unwrap the comprehension so mypy doesnt get mad

* remove abstract method

* fix flake8 errors

Co-authored-by: Edgar R. M <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants