Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add minLamport for proper GC of deactivated clients #1060

Merged
merged 2 commits into from
Nov 8, 2024

Conversation

JOOHOJANG
Copy link
Contributor

@JOOHOJANG JOOHOJANG commented Nov 7, 2024

What this PR does / why we need it:

This change improves GC handling when nodes from deactivated
clients remain in the document. Previously, GC failed for such nodes because
the min version vector lacked information about deactivated clients.

The solution:

  • Introduces minLamport concept for version vector comparison
  • When actor info is missing in min version vector, uses minimum value instead
  • Removes nodes if minimum value exceeds node's Lamport timestamp
  • Ensures safe deletion by confirming all active users received the change

This approach ensures proper cleanup of orphaned nodes while maintaining data
consistency across the system.

Which issue(s) this PR fixes:

Related to yorkie-team/yorkie-js-sdk#926

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Didn't break anything

Summary by CodeRabbit

  • New Features

    • Introduced a method to compute the minimum Lamport value in the version vector.
    • Updated version comparison logic to handle cases where an actor's version is absent.
  • Bug Fixes

    • Enhanced garbage collection behavior for documents updated by deactivated clients.
  • Tests

    • Added a test case to verify garbage collection functionality when a client is deactivated.

@JOOHOJANG JOOHOJANG requested a review from hackerwins November 7, 2024 14:46
@JOOHOJANG JOOHOJANG self-assigned this Nov 7, 2024
Copy link

coderabbitai bot commented Nov 7, 2024

Walkthrough

The changes involve modifications to the VersionVector implementation in pkg/document/time/version_vector.go, including the addition of a new method MinLamport for retrieving the minimum Lamport value and updates to the EqualToOrAfter method to handle cases where an actor's version is absent. Additionally, a new test case is introduced in test/integration/gc_test.go to validate garbage collection behavior after a client deactivation.

Changes

File Path Change Summary
pkg/document/time/version_vector.go - Added method: MinLamport() to compute the minimum Lamport value.
- Updated EqualToOrAfter(other *Ticket) to handle missing actor versions using MinLamport().
- Retained MaxLamport() with improved comments.
- Included math package for math.MaxInt64 usage in MinLamport().
test/integration/gc_test.go - Added test case "gc targeting nodes made by deactivated client" to TestGarbageCollection.
- Verified garbage collection behavior after a client deactivation.

Sequence Diagram(s)

sequenceDiagram
    participant Client1
    participant Client2
    participant Document
    participant GarbageCollector

    Client1->>Document: Update Document
    Document->>Client2: Sync Changes
    Client1->>Client1: Deactivate
    GarbageCollector->>Document: Check Garbage Length
    GarbageCollector->>Document: Collect Garbage
    GarbageCollector->>Client2: Verify Garbage Length
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
pkg/document/time/version_vector.go (2)

179-190: Consider handling empty version vectors

The implementation is correct, but it might be worth handling the edge case of empty version vectors explicitly.

Consider this minor enhancement:

 func (v VersionVector) MinLamport() int64 {
+    if len(v) == 0 {
+        return 0  // or another appropriate default value
+    }
+
     var minLamport int64 = math.MaxInt64
 
     for _, value := range v {
         if value < minLamport {
             minLamport = value
         }
     }
 
     return minLamport
 }

Line range hint 118-190: Architectural consideration for version vector GC

The implementation aligns well with distributed systems principles by properly handling version vectors for garbage collection. The approach of using minimum Lamport clock for comparison when actors are missing is a sound strategy for maintaining causal consistency during GC.

Consider documenting these key behaviors in the package documentation:

  1. How version comparison behaves when actors are missing
  2. The relationship between MinLamport and garbage collection
  3. The consistency guarantees provided by these changes
test/integration/gc_test.go (1)

1140-1198: Enhance test case documentation and assertions

The test case structure is good, but could benefit from:

  1. Adding comments explaining the expected behavior after client deactivation
  2. Documenting the significance of version vector changes
  3. Adding assertions to verify the document state after garbage collection
 t.Run("gc targeting nodes made by deactivated client", func(t *testing.T) {
+    // This test verifies that:
+    // 1. Garbage collection works properly when a client is deactivated
+    // 2. Version vectors are properly updated after client deactivation
+    // 3. Garbage nodes created by the deactivated client are collected
     ctx := context.Background()
     d1 := document.New(helper.TestDocKey(t))
     // ... rest of the test
     assert.NoError(t, c1.Deactivate(ctx))
 
     assert.Equal(t, d2.GarbageLen(), 2)
     assert.Equal(t, len(d2.VersionVector()), 2)
+    // Verify document state after deactivation
+    assert.Equal(t, `{"text":[{"val":"a"}]}`, d2.Marshal())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d22c53e and 4b3c6a4.

📒 Files selected for processing (2)
  • pkg/document/time/version_vector.go (3 hunks)
  • test/integration/gc_test.go (1 hunks)
🔇 Additional comments (2)
pkg/document/time/version_vector.go (2)

21-21: LGTM: Required import for math.MaxInt64

The math package import is correctly placed and necessary for the new MinLamport implementation.


118-126: LGTM: Enhanced version comparison logic

The updated logic properly handles missing actor versions by comparing against the minimum Lamport clock, which is essential for garbage collection when clients are deactivated.

Let's verify the usage of this method in GC-related code:

✅ Verification successful

EqualToOrAfter method changes are correctly implemented and used

The verification shows that the EqualToOrAfter method with MinLamport is properly used in garbage collection across the codebase:

  • It's primarily used in Root.GarbageCollect to determine if elements can be purged
  • The method is extensively tested in both unit and integration tests
  • Tests verify the behavior with missing actor versions
  • The implementation is used consistently in garbage collection across different CRDT types (arrays, objects, text)

The changes maintain causal consistency while enabling proper garbage collection of elements from deactivated clients.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the EqualToOrAfter method usage in GC context
# Expected: Find calls to EqualToOrAfter in GC-related code

# Search for EqualToOrAfter usage in GC context
rg -A 5 "EqualToOrAfter.*Ticket" 

# Search for GC-related files that might use this method
rg -l "gc|garbage.?collect" 

Length of output: 981


Script:

#!/bin/bash
# Let's check the actual usage of EqualToOrAfter in GC-related files
rg -B 3 -A 3 "EqualToOrAfter" pkg/document/crdt/gc.go pkg/document/document.go

# Also check test files to understand the expected behavior
rg -B 3 -A 3 "EqualToOrAfter" pkg/document/gc_test.go test/integration/gc_test.go

# Let's also look at the MinLamport usage since it's crucial for the new logic
rg -B 3 -A 3 "MinLamport" pkg/document/time/version_vector.go pkg/document/crdt/gc.go

Length of output: 1102


Script:

#!/bin/bash
# Let's look for any tests specifically verifying this behavior
ast-grep --pattern 'func Test$_($_) {
  $$$
  EqualToOrAfter
  $$$
}'

# Also check for any direct callers of EqualToOrAfter
rg -B 5 -A 5 "\\.EqualToOrAfter\\(" --type go

# Look for any GC-related code that might be affected
rg -B 5 -A 5 "collectGarbage|GarbageCollect" --type go

Length of output: 25249

Comment on lines +1193 to +1197
// TODO(JOOHOJANG): remove below comments after https://github.com/yorkie-team/yorkie/issues/1058 resolved
// Due to https://github.com/yorkie-team/yorkie/issues/1058, removing deactivated client's version vector is not working properly now.
// assert.NoError(t, c2.Sync(ctx))
// assert.Equal(t, d2.GarbageLen(), 0)
// assert.Equal(t, len(d2.VersionVector()), 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

⚠️ Potential issue

Based on the search results, I can now provide a final response:

Issue #1058 needs to be fixed to properly cleanup version vectors during client deactivation

The commented test code in test/integration/gc_test.go reveals that when a client is deactivated, its version vector is not being properly removed from the database. This is a critical issue because:

  1. The version vector of a deactivated client remains in the database, affecting the min version vector calculation
  2. This prevents proper garbage collection since the min version vector includes the deactivated client's vector
  3. The test expects d2.VersionVector() length to be 1 after deactivation, but it remains at 2

The implementation in server/backend/database/mongo/client.go shows that while client status is updated to "deactivated", there's no corresponding cleanup of the version vector.

🔗 Analysis chain

Critical issue: Version vector cleanup after client deactivation

The TODO comments reference issue #1058 which affects the proper cleanup of version vectors after client deactivation. This is a critical issue as it impacts garbage collection efficiency.

Let's verify if this issue has been addressed in other parts of the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for related version vector cleanup code
# Test: Search for version vector cleanup implementations
rg -A 5 "removeVersionVector|cleanupVersionVector|deactivateClient"

# Test: Search for related tests
rg -A 5 "TestDeactivateClient|TestVersionVectorCleanup"

Length of output: 2013


Script:

#!/bin/bash
# Let's search for the actual issue #1058 and version vector related code
# Search for the issue number in comments
rg -l "1058"

# Search for version vector related code and tests
rg -A 5 "VersionVector|versionVector"

# Search for garbage collection related tests
rg -A 5 "TestGarbageCollection|GarbageLen"

# Search for client deactivation implementation
ast-grep --pattern 'func $_ DeactivateClient($$$) {
  $$$
}'

Length of output: 201937


Script:

#!/bin/bash
# Let's search for issue #1058 and related code
rg -l "1058"

# Search for version vector cleanup in deactivate client implementation
rg -A 5 "DeactivateClient.*func"

# Search for version vector cleanup related code
rg -A 5 "removeVersionVector|cleanupVersionVector|filterVersionVector"

# Search for the actual implementation of client deactivation
ast-grep --pattern 'func $_ DeactivateClient($$$) {
  $$$
  UpdateVersionVector($$$)
  $$$
}'

Length of output: 1770


Script:

#!/bin/bash
# Let's search for the actual implementation of DeactivateClient and version vector cleanup
rg -A 10 "func.*DeactivateClient"

# Search for the issue #1058 content
rg -A 5 "1058" test/integration/gc_test.go

# Search for version vector cleanup in client deactivation
rg -A 10 "deactivate.*client.*version.*vector"

# Search for client deactivation related code
ast-grep --pattern 'func $_ Deactivate($$$) {
  $$$
}'

Length of output: 12616

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.

Project coverage is 46.77%. Comparing base (d22c53e) to head (4b3c6a4).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/document/time/version_vector.go 0.00% 16 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1060      +/-   ##
==========================================
- Coverage   46.83%   46.77%   -0.06%     
==========================================
  Files          84       84              
  Lines       12108    12123      +15     
==========================================
  Hits         5671     5671              
- Misses       5873     5888      +15     
  Partials      564      564              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@hackerwins hackerwins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

@hackerwins hackerwins changed the title Add min lamport to run gc properly Add minLamport for proper GC of deactivated clients Nov 8, 2024
@hackerwins hackerwins merged commit 6547bf8 into main Nov 8, 2024
5 checks passed
@hackerwins hackerwins deleted the add-min-lamport-to-run-gc-properly branch November 8, 2024 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants