Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MpscLinkedQueue GC issues #7799

Open
wants to merge 5 commits into
base: 3.x
Choose a base branch
from

Conversation

olivergillespie
Copy link

@olivergillespie olivergillespie commented Nov 21, 2024

Similar to https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/ MpscLinkedQueue.java#L120, null out the next pointer in the discarded consumer node when polling from the queue. If not, we leave behind a (potentially long) chain of connected garbage nodes. If we're unlucky (for example one of the early nodes is promoted to old generation, triggering nepotism), this can cause GC issues as now we have a long linked list which must be marked by young collections.

I noticed this in one of my applications, a heap dump showed an unreachable list of a few hundred thousand nodes all with null values.

Note: There are two commits here. One refactors poll() to (IMO) simplify the different cases, and the second actually fixes the GC issue. If preferred I can just fix the GC issue without the refactoring.

Reproducer:

import io.reactivex.rxjava3.internal.queue.MpscLinkedQueue;

public class MpscLinkedQueueGC {
    public static void main(String[] args) {
        MpscLinkedQueue<Integer> queue = new MpscLinkedQueue<>();
        for (int i = 0; i < 10; i++) System.gc(); // tenure consumer node
        while (true) {
            queue.offer(123);
            queue.poll();
        }
    }
}
Before fix:
$ java -Xlog:gc -Xmx1G -cp build/classes/java/main MpscLinkedQueueGC.java
...
[1.261s] GC(20) Pause Young (Normal) (G1 Preventive Collection) 115M->115M(204M) 209.335ms
[1.385s] GC(23) Pause Young (Normal) (G1 Evacuation Pause) 148M->149M(204M) 31.491ms
[1.417s] GC(24) Pause Young (Normal) (G1 Evacuation Pause) 157M->158M(204M) 19.333ms
[1.453s] GC(25) Pause Young (Normal) (G1 Evacuation Pause) 166M->167M(599M) 22.678ms
[1.966s] GC(26) Pause Young (Normal) (G1 Evacuation Pause) 249M->249M(497M) 305.238ms
...

After fix:
$ java -Xlog:gc -Xmx1G -cp build/classes/java/main MpscLinkedQueueGC.java
...
[1.169s] GC(14) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.755ms
[1.558s] GC(15) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.689ms
[1.948s] GC(16) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.800ms
[2.337s] GC(17) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.714ms
...

Handle empty queue first, then share most of the
implementation for non-empty scenarios (spin and
non-spin).
Similar to
https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscLinkedQueue.java#L120,
null out the next pointer in the discarded consumer node
when polling from the queue. If not, we leave behind a (potentially long)
chain of connected garbage nodes. If we're unlucky (for example one of
the early nodes is promoted to old generation, triggering nepotism),
this can cause GC issues as now we have a long linked list which must be
marked by young collections.

Reproducer:

```
import io.reactivex.rxjava3.internal.queue.MpscLinkedQueue;

public class MpscLinkedQueueGC {
    public static void main(String[] args) {
        MpscLinkedQueue<Integer> queue = new MpscLinkedQueue<>();
        for (int i = 0; i < 10; i++) System.gc(); // tenure consumer node
        while (true) {
            queue.offer(123);
            queue.poll();
        }
    }
}
```

```
Before fix:

$ java -Xlog:gc -Xmx1G -cp build/classes/java/main MpscLinkedQueueGC.java
...
[1.261s] GC(20) Pause Young (Normal) (G1 Preventive Collection) 115M->115M(204M) 209.335ms
[1.385s] GC(23) Pause Young (Normal) (G1 Evacuation Pause) 148M->149M(204M) 31.491ms
[1.417s] GC(24) Pause Young (Normal) (G1 Evacuation Pause) 157M->158M(204M) 19.333ms
[1.453s] GC(25) Pause Young (Normal) (G1 Evacuation Pause) 166M->167M(599M) 22.678ms
[1.966s] GC(26) Pause Young (Normal) (G1 Evacuation Pause) 249M->249M(497M) 305.238ms
...

After fix:
$ java -Xlog:gc -Xmx1G -cp build/classes/java/main MpscLinkedQueueGC.java
...
[1.169s] GC(14) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.755ms
[1.558s] GC(15) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.689ms
[1.948s] GC(16) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.800ms
[2.337s] GC(17) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.714ms
...
```
Copy link
Member

@akarnokd akarnokd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only fix the GC issue.

Similar to
https://github.com/JCTools/JCTools/blob/master/jctools-core/src/main/java/org/jctools/queues/MpscLinkedQueue.java#L120,
null out the next pointer in the discarded consumer node
when polling from the queue. If not, we leave behind a (potentially long)
chain of connected garbage nodes. If we're unlucky (for example one of
the early nodes is promoted to old generation, triggering nepotism),
this can cause GC issues as now we have a long linked list which must be
marked by young collections.

Reproducer:

```
import io.reactivex.rxjava3.internal.queue.MpscLinkedQueue;

public class MpscLinkedQueueGC {
    public static void main(String[] args) {
        MpscLinkedQueue<Integer> queue = new MpscLinkedQueue<>();
        for (int i = 0; i < 10; i++) System.gc(); // tenure consumer node
        while (true) {
            queue.offer(123);
            queue.poll();
        }
    }
}
```

```
Before fix:

$ java -Xlog:gc -Xmx1G -cp build/classes/java/main MpscLinkedQueueGC.java
...
[1.261s] GC(20) Pause Young (Normal) (G1 Preventive Collection) 115M->115M(204M) 209.335ms
[1.385s] GC(23) Pause Young (Normal) (G1 Evacuation Pause) 148M->149M(204M) 31.491ms
[1.417s] GC(24) Pause Young (Normal) (G1 Evacuation Pause) 157M->158M(204M) 19.333ms
[1.453s] GC(25) Pause Young (Normal) (G1 Evacuation Pause) 166M->167M(599M) 22.678ms
[1.966s] GC(26) Pause Young (Normal) (G1 Evacuation Pause) 249M->249M(497M) 305.238ms
...

After fix:
$ java -Xlog:gc -Xmx1G -cp build/classes/java/main MpscLinkedQueueGC.java
...
[1.169s] GC(14) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.755ms
[1.558s] GC(15) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.689ms
[1.948s] GC(16) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.800ms
[2.337s] GC(17) Pause Young (Normal) (G1 Evacuation Pause) 304M->2M(506M) 0.714ms
...
```
@olivergillespie
Copy link
Author

Updated with the minimal fix.

Copy link

codecov bot commented Nov 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.59%. Comparing base (9b55d01) to head (898c19d).
Report is 112 commits behind head on 3.x.

Additional details and impacted files
@@             Coverage Diff              @@
##                3.x    #7799      +/-   ##
============================================
- Coverage     99.62%   99.59%   -0.03%     
- Complexity     6801     6803       +2     
============================================
  Files           752      752              
  Lines         47707    47713       +6     
  Branches       6401     6402       +1     
============================================
- Hits          47527    47519       -8     
- Misses           84       89       +5     
- Partials         96      105       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants