Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add document hard deletion functionality to housekeeping tasks. #718

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

fourjae
Copy link
Contributor

@fourjae fourjae commented Dec 10, 2023

What this PR does / why we need it:
A housekeeping task was added to automatically physically delete documents at certain intervals.

Which issue(s) this PR fixes:

Fixes # #706

Although it does not directly solve the above issue, I think it is somewhat related to the need to improve the housekeeping structure.
I think the housekeeping structure could be improved if necessary.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Didn't break anything

Summary by CodeRabbit

  • New Features

    • Enhanced housekeeping functionality for managing document deletions and client deactivations with more granular control.
    • New configuration parameters for housekeeping intervals and limits, including document deletion grace periods.
    • Introduction of methods for identifying and deleting documents eligible for hard deletion based on project specifics.
  • Bug Fixes

    • Improved error handling during document deletion processes.
  • Documentation

    • Updated configuration files and sample files to reflect new housekeeping parameters and their usage.
    • Expanded test coverage to validate new configuration structures and parameters.

@fourjae fourjae marked this pull request as ready for review January 28, 2024 15:56
@krapie krapie self-requested a review January 28, 2024 16:01
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

Attention: Patch coverage is 37.55459% with 143 lines in your changes missing coverage. Please review.

Project coverage is 50.45%. Comparing base (0cdb329) to head (d45f2ab).
Report is 58 commits behind head on main.

Files Patch % Lines
server/backend/housekeeping/housekeeping.go 0.00% 101 Missing ⚠️
server/backend/database/mongo/client.go 45.61% 27 Missing and 4 partials ⚠️
server/backend/database/memory/database.go 73.17% 7 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #718      +/-   ##
==========================================
- Coverage   50.71%   50.45%   -0.27%     
==========================================
  Files          70       70              
  Lines       10213    10424     +211     
==========================================
+ Hits         5180     5259      +79     
- Misses       4512     4636     +124     
- Partials      521      529       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@krapie krapie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution.
I have left some comments on naming convention and few more.

About Housekeeping Task Naming Convention

Since we now newly introducing document hard deletion, it would be better to rename existing housekeeping task "Deactivate Candidate" (deactivateCandidates) to "Client Deactivate Candidate" (ClientDeactivateCandidates) for more clarification and naming consistency with "Document Hard Deletion" (DocumentHardDeletion).

cmd/yorkie/server.go Outdated Show resolved Hide resolved
cmd/yorkie/server.go Outdated Show resolved Hide resolved
server/backend/database/database.go Outdated Show resolved Hide resolved
server/backend/database/database.go Outdated Show resolved Hide resolved
server/backend/database/mongo/client.go Outdated Show resolved Hide resolved
server/backend/housekeeping/config.go Outdated Show resolved Hide resolved
server/backend/housekeeping/config_test.go Outdated Show resolved Hide resolved
server/backend/housekeeping/housekeeping.go Outdated Show resolved Hide resolved
server/backend/housekeeping/housekeeping.go Outdated Show resolved Hide resolved
server/backend/database/mongo/client.go Outdated Show resolved Hide resolved
@fourjae
Copy link
Contributor Author

fourjae commented Feb 13, 2024

Thank you. All comments have been reflected, and some feature changes and comment typos have been corrected for the update.

However, I think there is one thing that needs to be changed.
If an error occurs while working with the h.database.DeleteDocument() function, there are the following differences.

Client: Returns without rollback when err occurs during operation

Memdb: If err occurs during operation, rollback and return without committing

Which of the two methods should you proceed with?

@hackerwins
Copy link
Member

In #932, we improved housekeeping so that multiple tasks can be added. It might be good to try document deletion again.

@krapie
Copy link
Member

krapie commented Jul 20, 2024

@hackerwins Should we reopen this issue?
@fourjae Please notify me if you are still working on this PR.

@fourjae
Copy link
Contributor Author

fourjae commented Jul 22, 2024

@krapie sorry for forgetting. I will modify the work to fit the changed structure.

@hackerwins
Copy link
Member

@hackerwins Should we reopen this issue?
@fourjae Please notify me if you are still working on this PR.

If it's okay for @fourjae, It would be good to continue in this PR. 🙏

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (4)
cmd/yorkie/server.go (1)

213-216: Command-line flag description needs improvement.

The description for the documentHardDeletionGracefulPeriod flag should be more consistent with other descriptions.

- "Document deletion over time after a single housekeeping run"
+ "Graceful period for document deletion after a single housekeeping run"
server/backend/database/testcases/testcases.go (3)

Line range hint 218-218:
Follow the convention for function comments.

It would be better to follow the convention of other test cases.

// RunDocumentHardDeletionTest runs the RunDocumentHardDeletion tests for the given db.
func RunDocumentHardDeletionTest(t *testing.T, ...)

Line range hint 245-245:
Use a descriptive variable name.

Consider renaming GracePeriod to gracePeriod to follow Go naming conventions and improve readability.

- GracePeriod := "0s"
+ gracePeriod := "0s"

Line range hint 258-260:
Ensure proper error handling for deletion count.

The assertion should check if the number of deleted documents matches the expected count. Consider adding a more descriptive error message.

- assert.Equal(t, int(deletedDocumentsCount), len(candidates))
+ assert.Equal(t, len(candidates), int(deletedDocumentsCount), "Mismatch in the number of deleted documents")
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 379ee54 and d45f2ab.

Files selected for processing (16)
  • cmd/yorkie/server.go (3 hunks)
  • server/backend/database/database.go (2 hunks)
  • server/backend/database/memory/database.go (2 hunks)
  • server/backend/database/memory/database_test.go (1 hunks)
  • server/backend/database/memory/indexes.go (1 hunks)
  • server/backend/database/mongo/client.go (3 hunks)
  • server/backend/database/mongo/client_test.go (1 hunks)
  • server/backend/database/testcases/testcases.go (1 hunks)
  • server/backend/housekeeping/config.go (1 hunks)
  • server/backend/housekeeping/config_test.go (1 hunks)
  • server/backend/housekeeping/housekeeping.go (9 hunks)
  • server/config.go (2 hunks)
  • server/config.sample.yml (1 hunks)
  • server/rpc/server_test.go (1 hunks)
  • test/helper/helper.go (2 hunks)
  • test/sharding/server_test.go (1 hunks)
Additional comments not posted (48)
server/backend/housekeeping/config_test.go (4)

30-35: Ensure consistency in configuration field naming and validation.

The new fields IntervalDeactivateCandidates, IntervalDeleteDocuments, DocumentHardDeletionGracefulPeriod, ClientDeactivationCandidateLimitPerProject, and DocumentHardDeletionCandidateLimitPerProject are consistent with the previous naming conventions. The validation logic should ensure these fields are correctly parsed and validated.


40-48: Validate interval fields with proper time units.

The interval fields IntervalDeactivateCandidates, IntervalDeleteDocuments, and DocumentHardDeletionGracefulPeriod should be validated to ensure they use proper time units like "m", "h", "s".


52-57: Validate candidate limit fields for non-negative values.

The candidate limit fields ClientDeactivationCandidateLimitPerProject and DocumentHardDeletionCandidateLimitPerProject should be validated to ensure they are non-negative values.


60-61: Ensure project fetch size is non-negative.

The ProjectFetchSize field should be validated to ensure it is a non-negative value.

server/backend/database/memory/database_test.go (1)

107-109: Ensure comprehensive test coverage for document hard deletion.

The new test case DocumentHardDeletion test uses the RunDocumentHardDeletionTest function from the testcases package to test the document hard deletion functionality. Ensure that this test case covers all relevant scenarios for document hard deletion.

server/backend/housekeeping/config.go (4)

26-39: Ensure clarity and consistency in configuration field definitions.

The new fields IntervalDeactivateCandidates, IntervalDeleteDocuments, DocumentHardDeletionGracefulPeriod, ClientDeactivationCandidateLimitPerProject, and DocumentHardDeletionCandidateLimitPerProject are clearly defined and consistent with the existing configuration structure.


47-61: Ensure proper validation of interval fields.

The interval fields IntervalDeactivateCandidates, IntervalDeleteDocuments, and DocumentHardDeletionGracefulPeriod are validated using time.ParseDuration to ensure they use proper time units.


71-83: Ensure proper validation of candidate limit fields.

The candidate limit fields ClientDeactivationCandidateLimitPerProject and DocumentHardDeletionCandidateLimitPerProject are validated to ensure they are non-negative values.


Line range hint 84-89:
Ensure proper validation of project fetch size.

The ProjectFetchSize field is validated to ensure it is a non-negative value.

server/backend/database/mongo/client_test.go (1)

124-126: LGTM!

The new test case for document hard deletion is correctly added and integrates well with the existing tests.

server/config.sample.yml (6)

33-34: LGTM!

The new parameter IntervalDeactivateCandidates is correctly defined and provides necessary configuration for candidate deactivation.


36-37: LGTM!

The new parameter IntervalDeleteDocuments is correctly defined and provides necessary configuration for document deletion.


39-40: LGTM!

The new parameter HousekeepingDocumentHardDeletionGracefulPeriod is correctly defined and provides necessary configuration for managing document retention.


42-43: LGTM!

The new parameter ClientDeactivationCandidateLimitPerProject is correctly defined and provides necessary configuration for limiting client deactivation candidates.


45-46: LGTM!

The new parameter DocumentHardDeletionCandidateLimitPerProject is correctly defined and provides necessary configuration for limiting document hard deletion candidates.


48-49: LGTM!

The new parameter ProjectFetchSize is correctly defined and provides necessary configuration for limiting the number of projects.

server/backend/database/memory/indexes.go (1)

120-124: LGTM!

The new field project_id is correctly defined and enhances the schema by allowing for the storage and retrieval of data associated with project identifiers.

test/sharding/server_test.go (1)

80-85: Ensure consistency in configuration parameter naming.

The new configuration parameters are well-integrated. Verify that the naming conventions are consistent across the codebase.

Verification successful

Consistency in configuration parameter naming verified.

The new configuration parameters are consistently named across the codebase.

  • IntervalDeactivateCandidates
  • IntervalDeleteDocuments
  • DocumentHardDeletionGracefulPeriod
  • ClientDeactivationCandidateLimitPerProject
  • DocumentHardDeletionCandidateLimitPerProject
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistency in configuration parameter naming.

# Test: Search for the new configuration parameters. Expect: Consistent naming conventions.
rg --type go $'IntervalDeactivateCandidates|IntervalDeleteDocuments|DocumentHardDeletionGracefulPeriod|ClientDeactivationCandidateLimitPerProject|DocumentHardDeletionCandidateLimitPerProject'

Length of output: 9614

server/rpc/server_test.go (1)

85-90: Ensure consistency in configuration parameter naming.

The new configuration parameters are well-integrated. Verify that the naming conventions are consistent across the codebase.

Verification successful

The naming conventions for the new configuration parameters are consistent across the codebase.

The parameters are well-integrated and consistently named in various files.

  • server/rpc/server_test.go
  • server/config.go
  • test/helper/helper.go
  • server/backend/housekeeping/config.go
  • cmd/yorkie/server.go
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistency in configuration parameter naming.

# Test: Search for the new configuration parameters. Expect: Consistent naming conventions.
rg --type go $'IntervalDeactivateCandidates|IntervalDeleteDocuments|DocumentHardDeletionGracefulPeriod|ClientDeactivationCandidateLimitPerProject|DocumentHardDeletionCandidateLimitPerProject'

Length of output: 9614

server/config.go (2)

234-239: Ensure correct integration of new housekeeping constants.

The new constants are correctly integrated into the newConfig function. Verify that the integration is consistent across the codebase.


43-48: Ensure consistency in constant naming and usage.

The new constants are well-defined. Verify that the naming conventions are consistent across the codebase.

Verification successful

The new constants are well-defined and consistently named in the files server/config.go and cmd/yorkie/server.go. To ensure thorough verification, let's check if these constants are used in any other files within the codebase.


Ensure consistency in constant naming and usage.

The new constants are well-defined and consistently named across the codebase. They are used appropriately in the files server/config.go and cmd/yorkie/server.go.

  • server/config.go
  • cmd/yorkie/server.go
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistency in constant naming and usage.

# Test: Search for the new constants. Expect: Consistent naming conventions.
rg --type go $'DefaultHousekeepingIntervalDeactivateCandidates|DefaultHousekeepingIntervalDeleteDocuments|DefaultHousekeepingDocumentHardDeletionGracefulPeriod|DefaultHousekeepingClientDeactivationCandidateLimitPerProject|DefaultHousekeepingDocumentHardDeletionCandidateLimitPerProject'

Length of output: 1823


Script:

#!/bin/bash
# Description: Verify consistency in constant naming and usage across the entire codebase.

# Test: Search for the new constants in all Go files. Expect: Consistent naming conventions.
rg --type go 'DefaultHousekeepingIntervalDeactivateCandidates|DefaultHousekeepingIntervalDeleteDocuments|DefaultHousekeepingDocumentHardDeletionGracefulPeriod|DefaultHousekeepingClientDeactivationCandidateLimitPerProject|DefaultHousekeepingDocumentHardDeletionCandidateLimitPerProject'

Length of output: 1823

server/backend/database/database.go (3)

161-167: Method looks good.

The FindDocumentHardDeletionCandidatesPerProject method is well-defined and its parameters are appropriate for the intended functionality.


169-173: Method looks good.

The DeleteDocument method is well-defined and its parameters are appropriate for the intended functionality.


174-177: Method looks good.

The FindDocInfoByKey method is unchanged and its parameters are appropriate for the intended functionality.

server/backend/housekeeping/housekeeping.go (8)

35-36: Constants look good.

The constants deactivateCandidatesKey and documentHardDeletionLockKey are well-defined and appropriate for the locking mechanisms.


46-51: Fields look good.

The new fields added to the Housekeeping struct are well-defined and appropriate for managing intervals and limits for housekeeping tasks.


Line range hint 80-108:
Constructor function looks good.

The New constructor function is well-defined and appropriately parses the additional configuration parameters.


117-118: Method looks good.

The Start method is well-defined and appropriately starts the new housekeeping tasks.


Line range hint 128-148:
Method looks good.

The AttachDeactivateCandidates method is well-defined and appropriately handles the housekeeping loop for deactivating candidates.


150-170: Method looks good.

The AttachDocumentHardDeletion method is well-defined and appropriately handles the housekeeping loop for document hard deletion.


172-220: Method looks good.

The DeleteDocument method is well-defined and appropriately handles the deletion of documents.


311-331: Method looks good.

The FindDocumentHardDeletionCandidates method is well-defined and appropriately handles finding document hard deletion candidates.

cmd/yorkie/server.go (7)

41-45: Variables look good.

The new variables for housekeeping intervals and limits are well-defined and appropriate for managing different housekeeping tasks.


74-76: Function looks good.

The newServerCmd function is well-defined and appropriately reflects the new variables for housekeeping intervals and limits.


201-211: Command-line flags look good.

The new command-line flags for housekeeping intervals and limits are well-defined and appropriate for managing different housekeeping tasks.


219-221: Command-line flag looks good.

The command-line flag for ClientDeactivationCandidateLimitPerProject is well-defined and appropriately reflects the new variable name.


225-229: Command-line flag looks good.

The command-line flag for DocumentHardDeletionCandidateLimitPerProject is well-defined and appropriate for managing document deletion limits.


229-231: Command-line flag looks good.

The command-line flag for ProjectFetchSize is well-defined and appropriate for managing project fetch size.


201-231: Function looks good.

The init function is well-defined and appropriately includes the new command-line flags.

test/helper/helper.go (3)

65-66: Ensure consistency in interval naming.

The intervals for deactivating candidates and deleting documents are both set to 10 * gotime.Second. Verify that these intervals are appropriate and consistent with the rest of the system.


67-69: Check the new housekeeping limits.

The new variables HousekeepingDocumentHardDeletionGracefulPeriod and HousekeepingDocumentHardDeletionCandidateLimitPerProject are set to 0 * gotime.Second and 10 respectively. Ensure these values are appropriate for your use case.


216-221: Verify the updated housekeeping configuration.

The configuration now includes the new intervals and limits for housekeeping tasks. Ensure these changes are correctly integrated and tested.

server/backend/database/memory/database.go (2)

622-657: Ensure correct handling of document fetching.

The method FindDocumentHardDeletionCandidatesPerProject correctly fetches documents marked for hard deletion. Ensure the logic for checking the RemovedAt timestamp and the limit on candidates is appropriate.


704-726: Handle potential errors in document deletion.

The method DeleteDocument correctly deletes documents and handles errors. Ensure that the transaction is appropriately committed or rolled back in case of errors.

server/backend/database/mongo/client.go (3)

230-257: Handle potential errors in document deletion.

The method DeleteDocument correctly deletes documents and handles errors. Ensure that the deletion operation is appropriately handled and any errors are logged or managed.


651-677: Ensure correct handling of document fetching.

The method FindDocumentHardDeletionCandidatesPerProject correctly fetches documents marked for hard deletion. Ensure the logic for checking the RemovedAt timestamp and the limit on candidates is appropriate.


710-739: Ensure correct handling of client fetching.

The method FindDeactivateCandidates correctly fetches clients needing deactivation. Ensure the logic for cycling through projects and aggregating candidates is appropriate.

server/backend/database/testcases/testcases.go (1)

Line range hint 262-263:
Verify error handling for document not found.

Ensure that the error handling correctly identifies the document not found scenario.

Verification successful

Error handling for document not found is correctly implemented.

The code properly identifies and handles the ErrDocumentNotFound scenario using assert.ErrorIs(t, err, database.ErrDocumentNotFound).

  • Instances found in server/backend/database/testcases/testcases.go
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify error handling for document not found after deletion.

# Test: Search for the error handling code. Expect: Proper handling of ErrDocumentNotFound.
rg --type go -A 5 $'assert.ErrorIs(t, err, database.ErrDocumentNotFound)'

Length of output: 74


Script:

#!/bin/bash
# Description: Search for any references to ErrDocumentNotFound in the testcases.go file to verify error handling.

# Test: Search for the error handling code. Expect: Proper handling of ErrDocumentNotFound.
rg 'ErrDocumentNotFound' server/backend/database/testcases/testcases.go

Length of output: 236

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Outside diff range, codebase verification and nitpick comments (3)
server/backend/database/mongo/client.go (3)

704-729: Add logging for better traceability.

Adding logging can help trace the execution flow and identify issues more easily.

+  logging.DefaultLogger().Infof("Finding hard deletion candidates for project: %s", project.ID)

715-715: Rename variable to follow camelCase convention.

The variable name DocInfos should be docInfos to follow camelCase convention.

-  var DocInfos []*database.DocInfo
+  var docInfos []*database.DocInfo

763-791: Add logging for better traceability.

Adding logging can help trace the execution flow and identify issues more easily.

+  logging.DefaultLogger().Infof("Finding deactivate candidates, lastProjectID: %s", lastProjectID)
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between d45f2ab and c171085.

Files selected for processing (17)
  • cmd/yorkie/server.go (3 hunks)
  • server/backend/database/database.go (2 hunks)
  • server/backend/database/memory/database.go (2 hunks)
  • server/backend/database/memory/database_test.go (1 hunks)
  • server/backend/database/mongo/client.go (3 hunks)
  • server/backend/database/mongo/client_test.go (1 hunks)
  • server/backend/database/testcases/testcases.go (1 hunks)
  • server/backend/housekeeping/config.go (2 hunks)
  • server/backend/housekeeping/config_test.go (1 hunks)
  • server/backend/housekeeping/housekeeping.go (3 hunks)
  • server/clients/housekeeping.go (4 hunks)
  • server/config.go (2 hunks)
  • server/rpc/server_test.go (1 hunks)
  • server/server.go (1 hunks)
  • test/bench/tree_editing_bench_test.go (1 hunks)
  • test/helper/helper.go (2 hunks)
  • test/sharding/server_test.go (1 hunks)
Files skipped from review due to trivial changes (2)
  • server/backend/database/mongo/client_test.go
  • test/bench/tree_editing_bench_test.go
Files skipped from review as they are similar to previous changes (8)
  • cmd/yorkie/server.go
  • server/backend/database/memory/database.go
  • server/backend/database/memory/database_test.go
  • server/backend/database/testcases/testcases.go
  • server/config.go
  • server/rpc/server_test.go
  • test/helper/helper.go
  • test/sharding/server_test.go
Additional comments not posted (24)
server/backend/housekeeping/config_test.go (6)

30-35: LGTM!

The new configuration fields are correctly initialized.


40-41: LGTM!

The test case correctly checks for validation errors for invalid interval values.


44-45: LGTM!

The test case correctly checks for validation errors for invalid interval values.


48-49: LGTM!

The test case correctly checks for validation errors for invalid period values.


52-53: LGTM!

The test case correctly checks for validation errors for invalid limit values.


56-57: LGTM!

The test case correctly checks for validation errors for invalid limit values.

server/backend/housekeeping/housekeeping.go (5)

32-33: LGTM!

The Housekeeping struct correctly includes the Config and scheduler fields.


51-65: LGTM!

The RegisterTask method correctly sets up the scheduler for the housekeeping tasks.


Line range hint 67-75:
LGTM!

The RegisterTask method correctly sets up the scheduler for document deletion tasks.


Line range hint 79-81:
LGTM!

The Start method correctly starts the scheduler for the housekeeping service.


Line range hint 83-91:
LGTM!

The Stop method correctly stops the scheduler and handles errors.

server/backend/housekeeping/config.go (6)

28-41: LGTM!

The new fields in the Config struct enhance the configurability of the housekeeping process.


49-53: LGTM!

The Validate method correctly validates the new configuration fields.


57-63: LGTM!

The Validate method correctly validates the interval for document deletion.


65-71: LGTM!

The Validate method correctly validates the period for document hard deletion.


72-78: LGTM!

The Validate method correctly validates the limit for client deactivation candidates.


79-85: LGTM!

The Validate method correctly validates the limit for document hard deletion candidates.

server/clients/housekeeping.go (4)

39-39: Improved parameter naming.

The parameter candidatesLimitPerProject has been renamed to clientDeactivationCandidateLimitPerProject for better clarity.


96-96: Improved parameter naming.

The parameter candidatesLimitPerProject has been renamed to clientDeactivationCandidateLimitPerProject for better clarity.


182-204: LGTM!

The FindDocumentHardDeletionCandidates function is well-implemented with proper error handling and logging. The return values are consistent with the function's purpose.


125-180: LGTM! But verify the function usage in the codebase.

The DeleteDocument function is well-implemented with proper locking and error handling. The logging provides useful information.

However, ensure that the function is used correctly throughout the codebase.

server/server.go (1)

162-206: Enhanced housekeeping task registration.

The function now registers two separate tasks for deactivating clients and deleting documents, improving control over task intervals. The logic and error handling are well-implemented.

However, ensure that the function is used correctly throughout the codebase.

Verification successful

Function usage verified.

The function RegisterHousekeepingTasks is correctly used within the codebase, specifically in the server/server.go file. The implementation and usage are consistent and error handling is appropriately managed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `RegisterHousekeepingTasks`.

# Test: Search for the function usage. Expect: Correct usage of the updated function.
rg --type go -A 5 $'RegisterHousekeepingTasks'

Length of output: 700

server/backend/database/database.go (2)

168-173: LGTM!

The FindDocumentHardDeletionCandidatesPerProject method is well-implemented with proper error handling and logging. The return values are consistent with the method's purpose.


175-179: LGTM!

The DeleteDocument method is well-implemented with proper error handling and logging. The return values are consistent with the method's purpose.

server/backend/database/mongo/client.go Outdated Show resolved Hide resolved
server/backend/database/mongo/client.go Outdated Show resolved Hide resolved
server/backend/database/mongo/client.go Outdated Show resolved Hide resolved
fourjae added 2 commits August 1, 2024 22:59
HousekeepingDocumentHardDeletionGracefulPeriod default hour 14h -> 336h (14day * 24h)
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between c171085 and 6cc0e19.

Files selected for processing (5)
  • server/backend/database/mongo/client.go (3 hunks)
  • server/backend/housekeeping/housekeeping.go (3 hunks)
  • server/config.go (2 hunks)
  • server/config.sample.yml (1 hunks)
  • server/server.go (1 hunks)
Files skipped from review as they are similar to previous changes (4)
  • server/backend/database/mongo/client.go
  • server/backend/housekeeping/housekeeping.go
  • server/config.sample.yml
  • server/server.go
Additional comments not posted (3)
server/config.go (3)

43-45: New constants for housekeeping intervals and grace period.

The new constants provide more granular control over housekeeping tasks. Ensure that these values align with the intended housekeeping policies.


46-47: Renamed constants for candidate limits per project.

The constants have been renamed for clarity. Ensure that these changes are reflected in the rest of the codebase.


235-240: Updated newConfig function with new housekeeping constants.

The function now references the new housekeeping constants. Ensure that these changes are correctly integrated and tested.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 6cc0e19 and 7d636a3.

Files selected for processing (1)
  • server/backend/housekeeping/config_test.go (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • server/backend/housekeeping/config_test.go

@krapie krapie requested review from hackerwins and krapie August 7, 2024 06:56
@hackerwins hackerwins force-pushed the main branch 2 times, most recently from fbc6098 to 7d65a57 Compare August 9, 2024 02:00
@krapie
Copy link
Member

krapie commented Aug 23, 2024

@fourjae Sorry for the late review. I've been busy for couple of weeks.
I will look for this PR ASAP.

cc. @hackerwins

Copy link
Member

@krapie krapie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution :)

Firstly, I have left some comments about overall code conventions.

server/backend/database/database.go Outdated Show resolved Hide resolved
server/backend/database/testcases/testcases.go Outdated Show resolved Hide resolved
server/backend/database/database.go Outdated Show resolved Hide resolved
server/backend/housekeeping/config.go Outdated Show resolved Hide resolved
server/backend/housekeeping/config.go Outdated Show resolved Hide resolved
server/backend/housekeeping/housekeeping.go Outdated Show resolved Hide resolved
server/clients/housekeeping.go Outdated Show resolved Hide resolved
server/server.go Outdated Show resolved Hide resolved
server/backend/database/memory/database.go Outdated Show resolved Hide resolved
server/backend/database/mongo/client.go Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (4)
server/config.go (1)

237-237: Use String() method for consistency.

The DocumentHardDeletionGracefulPeriod field is set directly to the corresponding constant value, which is inconsistent with the other fields that use the String() method. To maintain consistency and avoid potential issues, consider updating the code as follows:

-DocumentHardDeletionGracefulPeriod:           DefaultHousekeepingDocumentHardDeletionGracefulPeriod,
+DocumentHardDeletionGracefulPeriod:           DefaultHousekeepingDocumentHardDeletionGracefulPeriod.String(),
server/backend/database/database.go (1)

175-179: LGTM, but please update the comment.

The DeleteDocuments method addition is approved. However, please update the comment at line 175 to match the method signature.

-// DeleteDocuments deletes document of the given key
+// DeleteDocuments deletes the documents specified by the given DocInfo pointers
server/backend/database/mongo/client.go (1)

230-263: Approve the implementation of DeleteDocuments.

The DeleteDocuments function is correctly implemented with proper error handling and validation.

Improve the error messages.

Consider making the error messages more descriptive:

  • When an invalid document ID is encountered, include the invalid ID in the error message.
  • When the deletion fails, include the specific error details in the returned error.
server/backend/database/testcases/testcases.go (1)

1414-1468: The code changes are approved with a suggestion to improve the test coverage.

The new test function RunDocumentHardDeletionTest is well-structured and covers the key aspects of the document hard deletion functionality. It follows the existing conventions in the file.

To further improve the test coverage, consider adding the following scenarios:

  1. Test the case when there are no candidate documents for hard deletion.
  2. Test the case when the grace period is set to a positive value.
  3. Test the case when there are multiple candidate documents for hard deletion.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 7d636a3 and 16d8749.

Files selected for processing (16)
  • cmd/yorkie/server.go (3 hunks)
  • server/backend/database/database.go (2 hunks)
  • server/backend/database/memory/database.go (2 hunks)
  • server/backend/database/memory/indexes.go (1 hunks)
  • server/backend/database/mongo/client.go (2 hunks)
  • server/backend/database/testcases/testcases.go (1 hunks)
  • server/backend/housekeeping/config.go (2 hunks)
  • server/backend/housekeeping/config_test.go (1 hunks)
  • server/backend/housekeeping/housekeeping.go (2 hunks)
  • server/clients/housekeeping.go (4 hunks)
  • server/config.go (2 hunks)
  • server/config.sample.yml (1 hunks)
  • server/rpc/server_test.go (1 hunks)
  • server/server.go (2 hunks)
  • test/helper/helper.go (2 hunks)
  • test/sharding/server_test.go (1 hunks)
Files skipped from review due to trivial changes (1)
  • server/backend/housekeeping/housekeeping.go
Additional comments not posted (45)
server/backend/housekeeping/config_test.go (6)

30-35: LGTM!

The code changes are approved.


40-40: LGTM!

The code changes are approved.


44-44: LGTM!

The code changes are approved.


48-50: LGTM!

The code changes are approved.


51-57: LGTM!

The code changes are approved.


59-62: LGTM!

The code changes are approved.

server/backend/housekeeping/config.go (4)

28-41: LGTM!

The changes to the Config struct improve the configurability and granularity of the housekeeping service. The new fields allow for separate intervals for deactivating candidates and deleting documents, and provide limits on the number of candidates per project. The naming of the fields is clear and descriptive.


57-63: LGTM!

The new check for the DeleteDocumentsInterval field ensures that it is a valid duration, improving the robustness of the configuration validation.


65-82: LGTM!

The new checks for the DocumentHardDeletionGracefulPeriod, ClientDeactivationCandidateLimitPerProject, and DocumentHardDeletionCandidateLimitPerProject fields ensure that they have valid values, improving the robustness of the configuration validation.


97-105: LGTM!

The changes to the ParseInterval method enhance its flexibility and usability by accepting an interval as a parameter. This allows the method to be used with different interval values without modifying the struct. The updated error message provides more context about the interval being parsed.

server/config.sample.yml (4)

33-34: LGTM!

The change from Interval to DeactivateCandidatesInterval improves clarity by using a more descriptive name for the parameter.


36-37: LGTM!

The addition of the DeleteDocumentsInterval parameter introduces an interval for document deletion, which is a new housekeeping functionality.


39-40: LGTM!

The addition of the HousekeepingDocumentHardDeletionGracefulPeriod parameter specifies a grace period for hard deletion of documents, which is a new housekeeping functionality.


42-46: LGTM!

The changes in this code segment are approved:

  • The change from CandidatesLimitPerProject to ClientDeactivationCandidateLimitPerProject improves clarity by using a more descriptive name for the parameter.
  • The addition of the DocumentHardDeletionCandidateLimitPerProject parameter introduces a limit for document deletion candidates, which is a new housekeeping functionality.
server/backend/database/memory/indexes.go (1)

139-147: LGTM!

The new compound index project_id_removed_at is a valuable addition to the schema variable. It allows for efficient querying based on the combination of ProjectID and RemovedAt fields, potentially improving performance for operations that involve filtering or sorting by both fields.

The index is correctly defined using the memdb.CompoundIndex type and the appropriate field types. The index name is clear and follows the naming convention of other indexes.

The code changes are approved.

server/clients/housekeeping.go (7)

31-31: LGTM!

The code changes are approved.


32-32: LGTM!

The code changes are approved.


40-40: LGTM!

The code changes are approved.


108-114: LGTM!

The code changes are approved.


116-124: LGTM!

The code changes are approved.


127-181: LGTM!

The new DeleteDocuments function looks good. It follows the existing coding conventions and implements the document deletion functionality with proper locking and error handling.


Line range hint 184-219: LGTM!

The new FindDocumentHardDeletionCandidates function looks good. It follows the existing coding conventions and implements the logic for finding document deletion candidates across multiple projects.

server/server.go (4)

97-100: LGTM!

The code changes are approved.


101-103: LGTM!

The code changes are approved.


164-187: LGTM!

The code changes are approved.


189-214: LGTM!

The code changes are approved.

test/sharding/server_test.go (1)

81-85: LGTM!

The changes to the housekeeping configuration parameters in the TestMain function look good. The new parameters provide more granular control over document management and candidate deactivation processes, which aligns with the AI-generated summary.

The changes are in a test file and do not directly impact the production code. The new parameters are initialized with values from the helper package, which likely contains predefined constants for testing.

Overall, the changes enhance the functionality of the housekeeping configuration and do not introduce any issues or inconsistencies.

server/config.go (4)

43-47: LGTM!

The introduction of the new constants enhances the clarity and readability of the housekeeping configuration. The constants provide more specific and descriptive names, making it easier to understand their purpose in the context of document deletion and candidate deactivation processes.


48-48: Please clarify the purpose and usage of DefaultHousekeepingProjectFetchSize.

The constant DefaultHousekeepingProjectFetchSize seems to be a standalone addition and its purpose is not immediately clear from the context. Could you please provide more information on how this constant is used in the housekeeping process and why it was introduced?


235-240: LGTM!

The updates to the Housekeeping configuration in the newConfig function ensure that the logic for housekeeping operations is now aligned with the new constants introduced earlier. This change enhances the consistency and clarity of the housekeeping configuration.


43-48: Overall, the code changes look good!

The introduction of the new constants and the updates to the housekeeping configuration enhance the clarity, consistency, and maintainability of the codebase. The changes provide more specific and descriptive names for the housekeeping intervals and limits, making it easier to understand their purpose in the context of document deletion and candidate deactivation processes.

Apart from the minor nitpick regarding the consistency of setting the DocumentHardDeletionGracefulPeriod field, the code changes are well-structured and contribute positively to the overall quality of the codebase.

Also applies to: 235-240

server/rpc/server_test.go (1)

86-91: LGTM!

The changes to the housekeeping configuration parameters are well-aligned with the PR objectives and significantly enhance the functionality of the housekeeping process. The new parameters provide more granular control over the management of candidate documents and their deletion processes, likely improving the efficiency of data management.

The code changes are approved.

server/backend/database/database.go (2)

23-23: LGTM!

The import statement is approved.


167-173: LGTM!

The FindDocumentHardDeletionCandidatesPerProject method addition is approved.

cmd/yorkie/server.go (6)

44-44: Previous review comment addressed.

The variable name documentHardDeletionGracefulPeriod follows the naming convention suggested in the previous review comment.


74-76: LGTM!

The code changes are approved.


200-211: LGTM!

The code changes are approved. The flag names and descriptions follow the naming convention suggested in the previous review comment.


213-216: LGTM!

The code changes are approved. The flag name and description follow the naming convention suggested in the previous review comment.


219-223: LGTM!

The code changes are approved. The flag name and description follow the naming convention suggested in the previous review comment.


224-229: LGTM!

The code changes are approved. The flag name and description follow the naming convention suggested in the previous review comment.

test/helper/helper.go (2)

65-73: LGTM!

The code changes are approved.


242-246: LGTM!

The code changes are approved.

server/backend/database/memory/database.go (2)

676-712: LGTM!

The code changes are approved.


758-780: LGTM!

The code changes are approved.

server/backend/database/mongo/client.go (1)

710-736: LGTM!

The FindDocumentHardDeletionCandidatesPerProject function is correctly implemented and follows the requirements. It effectively identifies the documents eligible for hard deletion based on the provided project and grace period.

@fourjae
Copy link
Contributor Author

fourjae commented Sep 4, 2024

@krapie Edited to reflect all comments.

Additionally, the grace time setting for deleted documents has been declared as 2 weeks (14d*24h). How many hours would be good?

@krapie
Copy link
Member

krapie commented Sep 5, 2024

@fourjae I think 2 weeks is good for now.

@krapie krapie self-requested a review September 5, 2024 13:08
@krapie
Copy link
Member

krapie commented Sep 5, 2024

@hackerwins Sudden wonder, is this task should be in housekeeping? not background?
Seems like housekeeping is sort of an extension of GC.
Referencing: https://github.com/yorkie-team/yorkie/blob/main/design/housekeeping.md

@krapie
Copy link
Member

krapie commented Sep 11, 2024

@hackerwins Could you mind reviewing this PR instead of me please?
I think you are more suitable to review this PR since you have restructured the housekeeping module recently.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (3)
server/backend/database/memory/database.go (1)

758-780: LGTM, but consider improving the error handling.

The code changes are approved. However, consider wrapping the error returned from txn.Delete with more context using fmt.Errorf to provide better error messages.

For example, you can modify the error handling as follows:

-return 0, fmt.Errorf("fetch hard deletion candidates: %w", err)
+return 0, fmt.Errorf("delete document %s: %w", candidate.ID, err)
server/backend/database/mongo/client.go (1)

230-263: LGTM, but consider handling the case when fewer documents are deleted.

The code changes are approved. However, consider handling the case when DeleteMany deletes fewer documents than expected, which could happen if some documents were deleted concurrently by another process. You can compare len(candidates) with deletedResult.DeletedCount and log a warning if they don't match.

server/backend/database/testcases/testcases.go (1)

1596-1650: Comprehensive test implementation! A few suggestions:

The test thoroughly covers the document hard deletion flow, including setup, configuration, candidate identification, deletion, and verification steps. Well done!

A few suggestions to consider:

  1. The fetchSize variable is set but not used in the FindDocumentHardDeletionCandidatesPerProject call. Consider removing if unused.
  2. To make the test more robust, validate the candidates slice before deletion to ensure it contains the expected document(s).
  3. Consider adding a negative test case to validate behavior when no candidates are found for deletion.
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 16d8749 and 86a3f5d.

Files selected for processing (6)
  • server/backend/database/memory/database.go (2 hunks)
  • server/backend/database/memory/database_test.go (1 hunks)
  • server/backend/database/mongo/client.go (2 hunks)
  • server/backend/database/mongo/client_test.go (1 hunks)
  • server/backend/database/testcases/testcases.go (1 hunks)
  • test/complex/main_test.go (1 hunks)
Additional comments not posted (5)
server/backend/database/memory/database_test.go (1)

115-117: LGTM, but verify the implementation of RunDocumentHardDeletionTest.

The code changes are approved. The new test case follows the existing pattern and enhances the test suite by including functionality to verify the hard deletion of documents.

However, ensure that the RunDocumentHardDeletionTest function is correctly implemented and thoroughly tested to confirm that it behaves as expected. Run the following script to verify the function implementation:

Verification successful

"""

"""

Verification Successful: RunDocumentHardDeletionTest is correctly implemented.

The RunDocumentHardDeletionTest function is defined and thoroughly tests the hard deletion of documents. It is correctly integrated into the test suite, ensuring that the functionality is verified. No further action is required.

  • Location of RunDocumentHardDeletionTest implementation: server/backend/database/testcases/testcases.go starting at line 1597.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of `RunDocumentHardDeletionTest`.

# Test: Search for the function definition.
# Expect: The function is defined and has a non-empty body.
ast-grep --lang go --pattern $'func RunDocumentHardDeletionTest($_, $_) {
  $$$
}'

Length of output: 4916

server/backend/database/mongo/client_test.go (1)

131-134: LGTM, but verify the implementation of RunDocumentHardDeletionTest.

The code changes are approved. The addition of the new test case for "DocumentHardDeletion test" enhances the test suite by including functionality to verify the hard deletion of documents, which is a critical aspect of database management. The test case follows the existing pattern and is correctly implemented.

However, please ensure that the RunDocumentHardDeletionTest function, which is assumed to contain the actual test logic for document hard deletion, is properly implemented. You can use the following script to verify its implementation:

Verification successful

"""

"""

The RunDocumentHardDeletionTest function is correctly implemented.

The function is defined in server/backend/database/testcases/testcases.go and thoroughly tests the hard deletion of documents by simulating the necessary operations and using assertions to verify the expected outcomes. The test case is well-structured and follows standard Go testing practices. No further action is needed.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of the `RunDocumentHardDeletionTest` function.

# Test: Search for the function definition.
# Expect: The function should be defined and contain the test logic for document hard deletion.
ast-grep --lang go --pattern $'func RunDocumentHardDeletionTest($_, $_) {
  $$$
}'

Length of output: 4916

test/complex/main_test.go (1)

91-96: LGTM!

The code changes are approved. The addition of new configuration parameters enhances the housekeeping functionality by providing more granular control over intervals and limits related to candidate deactivation and document deletion. The renaming of CandidatesLimitPerProject to ClientDeactivationCandidateLimitPerProject improves clarity. The changes are consistent with the existing code structure and align with the PR objectives.

server/backend/database/memory/database.go (1)

676-712: LGTM!

The code changes are approved.

server/backend/database/mongo/client.go (1)

710-736: LGTM!

The code changes are approved.

Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. Overall looks good. I left a few comments.

A. Collections under 'documents' need to be deleted as well

Yorkie Documents are stored across multiple collections:

  • documents: Stores document metadata (creation time, number of changes, etc.)
  • changes: Stores document changes (similar to oplog in databases)
  • snapshots: Stores document snapshots (similar to snapshots in databases)
  • syncedseqs: Records how far clients have synced

Please refer to this: https://miro.com/app/board/o9J_kidjCtk=/?moveToWidget=3458764569836960605&cot=14

B. Exception cases need to be handled

In Yorkie system, document replicas exist not only on the server but also on clients. We need to consider the following scenarios:

  • When a client requests synchronization and the document has been soft-deleted on the server (check removedAt in the documents collection), the local replica should be marked as deleted.
  • But if the documents collection has been physically hard-deleted by housekeeping processes, there's no way to determine if the document was deleted.

We need to think about how to handle this second case.
For more details on document removal, please check this design document:
https://github.com/yorkie-team/yorkie/blob/main/design/document-removal.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants