-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#868] Register VACUUM operation in the _delta_log #1552
[#868] Register VACUUM operation in the _delta_log #1552
Conversation
- update VacuumCommand.scala to add pre-post vacuum logs - update DeltaOperations.scala to add the actual operations committed to the log - update DeltaVacuumSuite with tests
core/src/main/scala/org/apache/spark/sql/delta/DeltaOperations.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/DeltaOperations.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/sources/DeltaSQLConf.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/commands/VacuumCommand.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/commands/VacuumCommand.scala
Show resolved
Hide resolved
core/src/main/scala/org/apache/spark/sql/delta/commands/VacuumCommand.scala
Show resolved
Hide resolved
* @param specifiedRetentionMillis - specified retention interval | ||
* @param defaultRetentionMillis - default retention period for the table | ||
*/ | ||
case class VacuumStart( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine VacuumStart
and Vacuum
to one? I think this is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean VacuumStart
and VacuumEnd
? It's better to have two events so that people can quickly check whether the vacuum command completes or not.
@@ -289,6 +292,42 @@ object VacuumCommand extends VacuumCommandImpl with Serializable { | |||
|
|||
trait VacuumCommandImpl extends DeltaCommand { | |||
|
|||
private val supportedFsForLogging = Seq( | |||
"wasbs", "wasbss", "abfs", "abfss", "adl", "gs", "file" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question: what is the relationship between whether log vacuum and the file system used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to make this a config. The list is also missing hdfs.
Before this change, vacuum doesn't make a commit, so people can run vacuum concurrently with any other queries. After this change, this could be dangerous, for example on S3, if people still run vacuum concurrently with any other queries without using a right LogStore. Hence, we don't enable it by default.
Hi @YannByron can you please provide me with your github email address so I can include you as a co-author here? Thanks for making the initial PR (inspiration for this PR) and for reviewing this PR! |
@scottsand-db thank you and i will send you by Slack. |
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a `CommitInfo` file which contains the delta operation info. `VacuumStart` operation contains metrics: `numFilesToDelete` and `sizeOfDataToDelete` `VacuumEnd` operation contains metrics: `numDeletedFiles` and `numVacuumedDirectories` New UTs. Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations. Closes delta-io#1552. Resolves delta-io#868. Co-authored-by: Yann Byron <[email protected]> GitOrigin-RevId: 94805531d022bac4afafd0b672d17b8828d8aa2c
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a `CommitInfo` file which contains the delta operation info. `VacuumStart` operation contains metrics: `numFilesToDelete` and `sizeOfDataToDelete` `VacuumEnd` operation contains metrics: `numDeletedFiles` and `numVacuumedDirectories` New UTs. Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations. Closes delta-io#1552. Resolves delta-io#868. Co-authored-by: Yann Byron <[email protected]> GitOrigin-RevId: 94805531d022bac4afafd0b672d17b8828d8aa2c
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a `CommitInfo` file which contains the delta operation info. `VacuumStart` operation contains metrics: `numFilesToDelete` and `sizeOfDataToDelete` `VacuumEnd` operation contains metrics: `numDeletedFiles` and `numVacuumedDirectories` New UTs. Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations. Closes #1552. Resolves #868. Co-authored-by: Yann Byron <[email protected]> GitOrigin-RevId: 94805531d022bac4afafd0b672d17b8828d8aa2c
Description
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a
CommitInfo
file which contains the delta operation info.VacuumStart
operation contains metrics:numFilesToDelete
andsizeOfDataToDelete
VacuumEnd
operation contains metrics:numDeletedFiles
andnumVacuumedDirectories
Closes #868.
How was this patch tested?
New UTs.
Does this PR introduce any user-facing changes?
Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations.