-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1567] Support throw FetchFailedException when Data corruption detected #2691
Conversation
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Btw, support for checksum/validation of data would a good feature to add IMO ... there were corner cases where this helped catch issues in spark (instead of relying on compression/deserialization failing ... which need not always happen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I have a question: should we retry fetching another replication before throwing a FetchFailedException when the conf |
This is not necessarily safe, because the Task may have read part of the data, so it is safer to retry the Task. This is how Spark handles it. |
It looks like we've already done this. celeborn/client/src/main/java/org/apache/celeborn/client/compress/ZstdDecompressor.java Lines 69 to 72 in 0ee3c3a
celeborn/client/src/main/java/org/apache/celeborn/client/compress/Lz4Decompressor.java Lines 82 to 85 in 0ee3c3a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thank you, merging to main(v0.6.0)/branch-0.5(v0.5.2)/branch-0.4(v0.4.3). |
…on detected ### What changes were proposed in this pull request? ### Why are the changes needed? #2655 (review) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes #2691 from cxzl25/CELEBORN-1567. Authored-by: sychen <[email protected]> Signed-off-by: Shaoyun Chen <[email protected]> (cherry picked from commit b8f275d) Signed-off-by: Shaoyun Chen <[email protected]>
…on detected ### What changes were proposed in this pull request? ### Why are the changes needed? #2655 (review) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes #2691 from cxzl25/CELEBORN-1567. Authored-by: sychen <[email protected]> Signed-off-by: Shaoyun Chen <[email protected]> (cherry picked from commit b8f275d)
…on detected ### What changes were proposed in this pull request? ### Why are the changes needed? apache#2655 (review) ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? GA Closes apache#2691 from cxzl25/CELEBORN-1567. Authored-by: sychen <[email protected]> Signed-off-by: Shaoyun Chen <[email protected]>
What changes were proposed in this pull request?
Why are the changes needed?
#2655 (review)
Does this PR introduce any user-facing change?
No
How was this patch tested?
GA