Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FSCK REPAIR TABLE Operation #1092

Closed
Blajda opened this issue Jan 23, 2023 · 0 comments · Fixed by #1103
Closed

Support FSCK REPAIR TABLE Operation #1092

Blajda opened this issue Jan 23, 2023 · 0 comments · Fixed by #1103
Labels
enhancement New feature or request

Comments

@Blajda
Copy link
Collaborator

Blajda commented Jan 23, 2023

Description

Databricks provides an operation called FSCK REPAIR TABLE that removes active files that no longer can be found in the underlying file system.

Use Case
Due to a hardware issue some parquet files were corrupted when written. This data is non-critical and I simply would like to delete to from the underlying storage and then use this operation to reconcile the log.

This operation also supports a dry run which can be used to check if files are missing due to an external issue.

Related Issue(s)

@Blajda Blajda added the enhancement New feature or request label Jan 23, 2023
@Blajda Blajda changed the title Support FSCK REAPIR TABLE Operation Support FSCK REPAIR TABLE Operation Jan 23, 2023
wjones127 added a commit that referenced this issue Feb 3, 2023
# Description
Implementation of the filesystem check operation.

The implementation is fairly straight forward with a HEAD call being
made for each active file to check if it exists.
A remove action is then made for each file that is orphaned.

An alternative solution is instead to maintain a hashset with all active
files and then recursively list all files. If the file exists then
remove from the set. All remaining files in the set are then considered
orphaned.
 
Looking for feedback and if the second approach is preferred I can make
the changes

# Related Issue(s)
- closes #1092

---------

Co-authored-by: Will Jones <[email protected]>
chitralverma pushed a commit to chitralverma/delta-rs that referenced this issue Mar 17, 2023
# Description
Implementation of the filesystem check operation.

The implementation is fairly straight forward with a HEAD call being
made for each active file to check if it exists.
A remove action is then made for each file that is orphaned.

An alternative solution is instead to maintain a hashset with all active
files and then recursively list all files. If the file exists then
remove from the set. All remaining files in the set are then considered
orphaned.
 
Looking for feedback and if the second approach is preferred I can make
the changes

# Related Issue(s)
- closes delta-io#1092

---------

Co-authored-by: Will Jones <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant