-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a clear warning in the documentation that optimize is not concurrently safe in Cloudflare R2 #1348
Comments
did you experience a specific bug or can you describe the specific scenario that worries you? n principle we are using the new commit function which includes conflict resolution. Thus it should be safe to use concurrently. Optimize in particular is not too critical, since it is not changing data, it may however fail for concurrent deletes. This should be captured tough by the conflict resolutio. |
I run this code with cloudflare R2, I got duplicated data
|
1 similar comment
I run this code with cloudflare R2, I got duplicated data
|
well ...safety is specifically disabled :). Without safe rename we cannot even properly detect that a version was previously created. "AWS_S3_ALLOW_UNSAFE_RENAME": "true" |
yes, but it will not work otherwise ? |
How do you mean? To have concurrent write support we (or rather delta also has requirements to the fs) require to have a safe rename mechanisms in the object store. If that is not available, all guarantees kind of break down. If that is met, at least in principle things should work. The repo even contains formal proofs for certain aspects of the implementation. |
sorry for not being clear, I think it is the same issue here if I don't add this option ""AWS_S3_ALLOW_UNSAFE_RENAME": "true"" optimize not write delta will work at all |
Yes, this is what should happen:
|
I do hear you, though, that with optimize implemented, we need to have clearly warnings in the documentation about what |
@wjones127 btw thank you very much for your work, since 0.9, delta lake python is becoming very useful for me. |
while we should improve the docs, atomic rename may soon be available for R2 storages. apache/arrow-rs#4194. we will have to do some extra work on the delta-rs end to not require the lock client for r2, but at least the basics are there ... should we update this issue or create a new one to track these changes? |
Environment
0.9
Binding: Python
Environment:
Bug
Optimize is not a safe operation when it is done concurrently with another delta writer, it will be nice to make it clear in the documentation, it is not a big problem as a user can always suspend writer 1 and do maintenance in that window, but it will be nice to be documented.
The text was updated successfully, but these errors were encountered: