You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When utilizing the table.create_checkpoint() method in Python, direct access to the cleanup_metadata() function is currently unavailable. This limitation results in an increased number of log entries. In Spark, this feature is integrated as a step at the end of the method (as evident in the Spark implementation). However, it's conspicuously absent from the Python API.
In the Rust implementation (as seen in the Rust implementation), there's a cleanup_metadata and also create_checkpoint_from_table_uri_and_cleanup method, but it's not called at the end of the create_checkpoint process.
There is another Pull Request (PR) that addresses the creation of a checkpoint both before and after the vacuum operation. However, the absence of the cleanup_metadata method during the checkpoint creation process results in the metadata directory remaining uncleared.
Use Case
Cleaning up the Metadata Directory After Checkpoint Creation or Optimization
I agree that the cleanup_metadata method should be made available in Python, as it can be useful to reduce the file listing time as the number of files in _delta_log grows. However I would definitely not add it to the create_checkpoint function. You generally want to keep the metadata, so you have the ability to roll back to a previous version.
Description
When utilizing the
table.create_checkpoint()
method in Python, direct access to thecleanup_metadata()
function is currently unavailable. This limitation results in an increased number of log entries. In Spark, this feature is integrated as a step at the end of the method (as evident in the Spark implementation). However, it's conspicuously absent from the Python API.In the Rust implementation (as seen in the Rust implementation), there's a
cleanup_metadata
and alsocreate_checkpoint_from_table_uri_and_cleanup
method, but it's not called at the end of thecreate_checkpoint
process.There is another Pull Request (PR) that addresses the creation of a checkpoint both before and after the vacuum operation. However, the absence of the
cleanup_metadata
method during the checkpoint creation process results in the metadata directory remaining uncleared.Use Case
Cleaning up the Metadata Directory After Checkpoint Creation or Optimization
Related Issue(s)
#1728
The text was updated successfully, but these errors were encountered: