Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#868] Register VACUUM operation in the _delta_log #1552

Conversation

scottsand-db
Copy link
Collaborator

@scottsand-db scottsand-db commented Jan 9, 2023

Description

This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a CommitInfo file which contains the delta operation info.

VacuumStart operation contains metrics: numFilesToDelete and sizeOfDataToDelete

VacuumEnd operation contains metrics: numDeletedFiles and numVacuumedDirectories

Closes #868.

How was this patch tested?

New UTs.

Does this PR introduce any user-facing changes?

Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations.

- update VacuumCommand.scala to add pre-post vacuum logs
- update DeltaOperations.scala to add the actual operations committed to the log
- update DeltaVacuumSuite with tests
@scottsand-db scottsand-db self-assigned this Jan 9, 2023
@scottsand-db scottsand-db added this to the 2.3.0 milestone Jan 9, 2023
* @param specifiedRetentionMillis - specified retention interval
* @param defaultRetentionMillis - default retention period for the table
*/
case class VacuumStart(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine VacuumStart and Vacuum to one? I think this is enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean VacuumStart and VacuumEnd? It's better to have two events so that people can quickly check whether the vacuum command completes or not.

@@ -289,6 +292,42 @@ object VacuumCommand extends VacuumCommandImpl with Serializable {

trait VacuumCommandImpl extends DeltaCommand {

private val supportedFsForLogging = Seq(
"wasbs", "wasbss", "abfs", "abfss", "adl", "gs", "file"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question: what is the relationship between whether log vacuum and the file system used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to make this a config. The list is also missing hdfs.

Before this change, vacuum doesn't make a commit, so people can run vacuum concurrently with any other queries. After this change, this could be dangerous, for example on S3, if people still run vacuum concurrently with any other queries without using a right LogStore. Hence, we don't enable it by default.

@scottsand-db
Copy link
Collaborator Author

scottsand-db commented Jan 11, 2023

Hi @YannByron can you please provide me with your github email address so I can include you as a co-author here? Thanks for making the initial PR (inspiration for this PR) and for reviewing this PR!

@YannByron
Copy link
Contributor

@scottsand-db thank you and i will send you by Slack.

scottsand-db added a commit to scottsand-db/delta that referenced this pull request Jan 12, 2023
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a `CommitInfo` file which contains the delta operation info.

`VacuumStart` operation contains metrics: `numFilesToDelete` and `sizeOfDataToDelete`

`VacuumEnd` operation contains metrics: `numDeletedFiles` and `numVacuumedDirectories`

New UTs.

Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations.

Closes delta-io#1552.
Resolves delta-io#868.

Co-authored-by: Yann Byron <[email protected]>
GitOrigin-RevId: 94805531d022bac4afafd0b672d17b8828d8aa2c
scottsand-db added a commit to scottsand-db/delta that referenced this pull request Jan 12, 2023
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a `CommitInfo` file which contains the delta operation info.

`VacuumStart` operation contains metrics: `numFilesToDelete` and `sizeOfDataToDelete`

`VacuumEnd` operation contains metrics: `numDeletedFiles` and `numVacuumedDirectories`

New UTs.

Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations.

Closes delta-io#1552.
Resolves delta-io#868.

Co-authored-by: Yann Byron <[email protected]>
GitOrigin-RevId: 94805531d022bac4afafd0b672d17b8828d8aa2c
scottsand-db added a commit that referenced this pull request Jan 13, 2023
This PR registers the start and end of VACUUM operations in the delta log. This means that we commit a commit with no Add/Remove files, and only a `CommitInfo` file which contains the delta operation info.

`VacuumStart` operation contains metrics: `numFilesToDelete` and `sizeOfDataToDelete`

`VacuumEnd` operation contains metrics: `numDeletedFiles` and `numVacuumedDirectories`

New UTs.

Expose additional metrics and history in the _delta_log for the start and end of VACUUM operations.

Closes #1552.
Resolves #868.

Co-authored-by: Yann Byron <[email protected]>
GitOrigin-RevId: 94805531d022bac4afafd0b672d17b8828d8aa2c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Register Vacuum in the table history
3 participants