Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in 2.25.0-SNAPSHOT #3106

Open
ppkarwasz opened this issue Oct 18, 2024 · 7 comments · May be fixed by #3123
Open

Performance regression in 2.25.0-SNAPSHOT #3106

ppkarwasz opened this issue Oct 18, 2024 · 7 comments · May be fixed by #3123
Labels
performance Issues or PRs that affect performance, throughput, latency, etc.
Milestone

Comments

@ppkarwasz
Copy link
Contributor

As shown by @hailshao in #3093, there might be a 60% loss of performance in the current 2.25.0-SNAPSHOT, compared to 2.24.1.
Before we publish 2.25.0 we need to run the benchmarks and fix the problem.

I consider this a blocker for the 2.25.0 release.

@ppkarwasz ppkarwasz added the performance Issues or PRs that affect performance, throughput, latency, etc. label Oct 18, 2024
@ppkarwasz ppkarwasz added this to the 2.25.0 milestone Oct 18, 2024
@ppkarwasz
Copy link
Contributor Author

The performance hit is probably due to the new %xEx. Performance could be improved by switching to %ex by default (since most users don't need packaging information anyway).

@jengebr, could you look at this? Does the performance regression also affect your usage?

@vy
Copy link
Member

vy commented Oct 25, 2024

Suspect: #2691

@jengebr
Copy link
Contributor

jengebr commented Oct 25, 2024

@ppkarwasz our apps only rarely log exceptions, so this cost is present but quite small in comparison with other log4j work (especially MDC/ThreadContext). I'll take a look at this in isolation though, and get back in a few days.

@ppkarwasz
Copy link
Contributor Author

Thanks, the statistics I mentioned above are obviously a little bit artificial: if we want to benchmark the formatting of an exception, we should take as a baseline the creation of exceptions without logging (which is probably very slow).

@jengebr
Copy link
Contributor

jengebr commented Oct 25, 2024

I recreated a large performance regression coming from ThrowablePatternConverter.format (2.25.0) vs ExtendedThrowablePatternConverter.format (2.24.1). Does that sound right so far?

FWIW I was able to create one Exception as a static variable, then reuse that for every test iteration. It's not a perfect reproduction but it certainly demonstrates the regression quite clearly.

@ppkarwasz
Copy link
Contributor Author

So even ThrowablePatternConverter (2.25.0) is slower than ExtendedThrowablePatternConverter (2.24.1)? That is unexpected, we'll need to check that.

@jengebr
Copy link
Contributor

jengebr commented Oct 25, 2024

Problem

Data shows 2.24.x has nearly zero object allocation during the benchmark, but 2.25.0 has far more than that. The allocation-heavy code in 2.25.x also comes with significant cpu load.

Caveat: I'm not 100% certain I reproduced the specific issue reported.

Data

  1. Async-profiler data shows that the overall hot path isorg.apache.logging.log4j.core.pattern.ThrowableExtendedStackTraceRenderer$ExtendedContext.lambda$createClassResourceInfoByName$0. Put another way, ThrowableExtendedStackTraceRenderer.createClassResourceInfoByName().
  2. The benchmark (attached below) shows the old code allocates almost zero objects but the new code allocates rather more than that.
  3. The benchmark shows the old code is roughly 2x faster than the new, although that number may vary by system.

Analysis

Examination of createClassResourceInfoByName shows a very large amount of object allocation right at the beginning:

final Map<String, ClassResourceInfo> classResourceInfoByName =
		StackLocatorUtil.getCurrentStackTrace().stream()
				.collect(Collectors.toMap(
						Class::getName,
						clazz -> new ClassResourceInfo(clazz, true),
						(classResourceInfo1, classResourceInfo2) -> classResourceInfo1));
final Set<Throwable> visitedThrowables = new HashSet<>();
final Queue<Throwable> pendingThrowables = new ArrayDeque<>(Collections.singleton(rootThrowable));

... but what doesn't jump out is that there is a new ClassResourceInfo() for each item in the stack trace. Diving into that constructor shows an indirect call to String.format - which is a very expensive and object-heavy convenience method.

Recommendations

  1. Do whatever it takes to eliminate String.format and any per-element object allocations.
  2. Reduce the allocations from the HashSet and ArrayDeque.

Benchmark

My benchmark is attached, and should be run with -Xmx1g -Xms1g -XX:+UseParallelGC. It definitely reproduces a problem but I'm not certain it's exactly the one that was reported.... please let me know if I'm off.

Config file: log4j2.xml.txt
Benchmark: Log4jExceptionSpeedTest.java.txt
Support class for the benchmark: TestRunner.java.txt

@alan0428a alan0428a linked a pull request Oct 26, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issues or PRs that affect performance, throughput, latency, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants