-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept for custom temporary array allocation #370
Conversation
I created this PR to serve as a baseline for a discussion. I don't think that it makes sense to have several different cache implementations, but I don't want to go down a rabbit hole implementation without discussing it first. I think that it'd be better to use an interface over the existing singleton/static-method approach. Primarily because
|
Codecov Report
@@ Coverage Diff @@
## master #370 +/- ##
============================================
- Coverage 52.07% 52.07% -0.01%
- Complexity 7151 7155 +4
============================================
Files 383 383
Lines 40269 40285 +16
Branches 6506 6506
============================================
+ Hits 20971 20979 +8
- Misses 17782 17793 +11
+ Partials 1516 1513 -3
Continue to review full report at Codecov.
|
I did some tests on how long the code spends in the render function when I open 6 plots at 640Hz: @Override
public List<DataSet> render(final GraphicsContext gc, final Chart chart, final int dataSetOffset,
final ObservableList<DataSet> datasets) {
long t0 = System.nanoTime();
try {
return super.render(gc, chart, dataSetOffset, datasets);
} finally {
long us = TimeUnit.NANOSECONDS.toMicros(System.nanoTime() - t0);
recorder.recordValue(Math.max(1, us));
}
} There are 3 cases:
The top plot shows the maximum render time for each second. The bottom plot is a percentile graph, e.g., 99.9% of measurements are below X. |
Here is another one running on JDK 15 with ZGC.
So in the fast majority of our cases, changing the array cache to be a customizable interface provides pretty much the same speedup as writing an optimized custom renderer. |
N.B. (for others reading this) the function of the [Double...]ArrayCache is there to minimise the frequent reallocation of transient arrays, that are needed for screen coordinate transforms, data reduction, and other algorithms. These arrays are usually only temporarily needed (e.g. in case of in-place math operation are not possible or desired). The reason its global is similar to thread- or worker pools, where this allows sharing resources across different parts of an application (ie. data reception, processing, display etc) and keeps the overall resource footprint small(er). This is already a great improvement to the JVM stock behaviour that requires allocation-deallocation gymnastics by the GC and is generic, thread-safe and safe to use for regular developer/user. Having said that -- I like your idea of allowing to customise either the existing parameters (e.g. allocating strict or not) or to also inject another specific allocator/buffer for a given renderer or scope of the lib. While this appears as a significant improvement, it's important to point out that this custom implementation has some caveat and design assumptions that may not be safe to use for a non-initiated developer that hasn't bought into the disruptor paradigm of thinking:
To put in my two-penny worth:
|
This is of course correct. I mainly wanted to show a use case that is very application dependent and couldn't be easily implemented with a true/false flag. The thread-safety overhead is pretty significant if it's not needed.
To clarify, the measurements are measuring how long the JavaFX thread spent in the The data was updated at the 60Hz frame rate (AnimationTimer), so all reads and writes only happen on the JavaFX thread. This is actually a best case scenario for synchronization as the lock is only accessed by a single thread without contention. |
I tried to reproduce it on the Given that there are already other extension points we can use to work around this I'll go ahead and close the PR. |
Problem
During profiling I noticed that a noticeable performance degradation caused by a lot of temporary array allocations in
CachedDataPoints
. Each trace has a maximum number of points, but the amount of visible points can change on the selected time window and can become larger and smaller. This messes with thegetArrayExact
method and results in allocating new arrays at almost every frame.A FlighRecording of 10 charts over 30 seconds looks like this:
Custom Buffer Pool
In my case I already know the perfect buffer size, so all of that is unnecessary overhead. I created a little PR that allows users to customize the behavior a bit more, e.g., below is an example for a pool that doesn't require synchronization and doesn't allocate arrays below a certain size:
Result
In my case this allocates a total of only 6 buffers over the entire lifetime of the app. The rendering performance at higher loads is noticeably smoother