Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shenandoah pause time includes concurrent phases #2377

Closed
shakuzen opened this issue Dec 9, 2020 · 2 comments
Closed

Shenandoah pause time includes concurrent phases #2377

shakuzen opened this issue Dec 9, 2020 · 2 comments
Labels
bug A general bug
Milestone

Comments

@shakuzen
Copy link
Member

shakuzen commented Dec 9, 2020

Similar to #2372, after a closer look at our gc pause metrics reported, they do not match up with the gc logging. Among the GarbageCollectorMXBean and GarbageCollectionNotificationInfo, it seems the Shenandoah Cycles gc name corresponds to the concurrent phase time and should not be counted in the pauses. I could not find this documented on the Shenandoah wiki. However, judging from the spectator implementation and since the gc logging is more clear about what time is spent where, I think we can update our implementation to count Shenandoah Cycles as concurrent phase time.

@shakuzen shakuzen added the bug A general bug label Dec 9, 2020
@shakuzen shakuzen added this to the 1.6.2 milestone Dec 9, 2020
@bric3
Copy link

bric3 commented Feb 23, 2021

Hi there,

Hi I am not sure yet, but I noticed that in the following graph the jvm.gc.pause reports the action as ConcurrentGC. Which is confusing for the same reasons as those stated in #2372, I don't have access to the specific collection id, at this time though.

rate(jvm_gc_pause_seconds_sum{job="$job", pod="$pod"}[1m])/rate(jvm_gc_pause_seconds_count{job="$job", pod="$pod"}[1m])

And at the same time these are the real concurrent actions.

rate(jvm_gc_concurrent_phase_time_seconds_sum{job="$job", pod="$pod"}[1m])/rate(jvm_gc_concurrent_phase_time_seconds_count{job="$job", pod="$pod"}[1m])

image

Corretto JDK 11.0.10, Shenandoah compact heuristic.

When the manager name is not Shenandoah Cycles, does that mean it can only be Shenandoah Pauses (reading memoryManager.cpp makes me think it's the case)? And if the GC cause starts with Concurrent should it be treated as a concurrent action as well ? Or if not are the duration number accurate to represent a pause ?

                if (isConcurrentPhase(gcCause, notificationInfo.getGcName())) {
                    Timer.builder("jvm.gc.concurrent.phase.time")
                            .tags(tags)
                            .tags("action", gcAction, "cause", gcCause)
                            .description("Time spent in concurrent phase")
                            .register(registry)
                            .record(duration, TimeUnit.MILLISECONDS);
                } else {
                    Timer.builder("jvm.gc.pause")
                            .tags(tags)
                            .tags("action", gcAction, "cause", gcCause)
                            .description("Time spent in GC pause")
                            .register(registry)
                            .record(duration, TimeUnit.MILLISECONDS);
                }

Shenandoah documentation don't dive in this GC notification topic.


Another case, focusing on the jvm.gc.pause timer metric

shenandoah-static-heuristic-max-end-spikes

Corretto JDK 11.0.10, Shenandoah static heuristic.

Looking at the GC logs around 17:05 (from 750s to 910s), there's no pauses that goes up to 170 ms, yet they are indeed a few pauses (~10 to ~26ms) related to init mark and and final mark (init update refs and final update refs are really small, sub-millisecond)

2021-02-24_15-51-30-gc.log

That leads me to doubt the correctness of the real pause value for Shenandoah. Not sure Micrometer can do something about it though.

@shakuzen
Copy link
Member Author

Looking at the GC logs around 17:05 (from 750s to 910s), there's no pauses that goes up to 170 ms, yet they are indeed a few pauses (~10 to ~26ms) related to init mark and and final mark (init update refs and final update refs are really small, sub-millisecond)

I found the following in the logs, so it seems the metrics align with the gc logs.

[743.851s][info ][gc               ] GC(99) Pause Final Mark (unload classes) 168.860ms

And if the GC cause starts with Concurrent should it be treated as a concurrent action as well ? Or if not are the duration number accurate to represent a pause ?

I agree it is unexpected from the naming, and we could reach out to the Shenandoah team at their mailing list to seek clarification, but my understanding is that despite the cause being Concurrent GC those are the pauses, and as far as I have seen, that matches what is in the GC logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants