-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't commit if commit has transaction identifier that has been committed already #2580
Comments
If the commits are performed in series then the behavior you are experiencing is expected. See the below quote from the Delta protocol.
I don't think the current changes in your branch should be merged into delta-rs since it's application specific. An alternative is to generalize your changes in |
Thanks for your answer @Blajda!
I see! Given what the protocol document says, I understand that we can't discard versions with lower versions. What I'm worried about with a wrapper is that it's likely a race condition, since it's outside the write and thus outside the commit retry logic. But I guess that's hard to avoid!
That sounds like a good idea! I see you mention it here as well #2130. Is there any scaffolding for something like that already? Would it be similar to the post commit hooks, but as a pre commit hook? To clarify: PR #2539 which was merged, what that does is all the work of inserting and keeping track of the identifiers, but there is no reconciliation yet, correct? I was a bit confused by "fix: only keep latest txn per app" |
I found the delta-rs/crates/core/src/operations/transaction/conflict_checker.rs Lines 575 to 594 in 0a44a0d
But looks like it would only kick in if there's a commit conflict, i.e. not if they are subsequent, and commit 2 only already has read commit 1. delta-rs/crates/core/src/operations/transaction/mod.rs Lines 570 to 580 in 0a44a0d
delta-rs/crates/core/src/operations/transaction/application.rs Lines 85 to 86 in 0a44a0d
In summary, my understanding of how this behaves:
Those two scenarios should cover it, though! |
I'm not sure if this is a bug or a feature request. It might be that I'm using the library incorrectly or misunderstanding how this should work!
I want to do idempotent commits, using transaction identifiers. I have seen that there has been added some support for this recently, e.g. 88ea110, 54cfa06.
What I'm seeing is that if I make multiple commits with the same app_transaction (same ID and version), they will all be written. And same with any other combination. Is that the correct behavior?
This commit seems to me to indicate that delta-rs should be able to only keep the latest version per txn app ID, so maybe I'm doing something wrong? 54cfa06
For comparison, Databricks SQL will only commit the above scenario once.
In case I am onto something, here is a branch I've pushed with a test that fails on main, but which passes with my modification: It writes two commits where both have the same app ID + version. We verify that the second write did not cause a new active file. vegarsti@4e52f4f
There's some nuance here, though: it's allowed for a commit to have multiple app transactions. I have no idea how to handle that, please advise 😅
Also, if the commit fails because of a concurrent update, I would also need to do the same check after the snapshot update.
The text was updated successfully, but these errors were encountered: