-
Notifications
You must be signed in to change notification settings - Fork 38.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisit ConcurrentLruCache implementation #26320
Labels
in: core
Issues in core modules (aop, beans, core, context, expression)
type: enhancement
A general enhancement
Milestone
Comments
spring-projects-issues
added
the
status: waiting-for-triage
An issue we've not yet triaged or decided on
label
Dec 27, 2020
bclozel
added
type: enhancement
A general enhancement
and removed
status: waiting-for-triage
An issue we've not yet triaged or decided on
labels
Jun 1, 2021
bclozel
changed the title
ConcurrentLruCache has poor performance
Rework ConcurrentLruCache implementation
Sep 20, 2021
bclozel
changed the title
Rework ConcurrentLruCache implementation
Revisit ConcurrentLruCache implementation
Sep 20, 2021
I noticed the performance issue running stress test on a SSE application. A fine workaround that worked for me was to include a custom message converter and override the addDefaultHeaders method to always include the same expected content type header. |
rstoyanchev
added
the
in: core
Issues in core modules (aop, beans, core, context, expression)
label
Nov 24, 2021
bclozel
added a commit
to bclozel/spring-framework
that referenced
this issue
Aug 22, 2022
Prior to this commit, the `ConcurrentLruCache` implementation would not perform well under certain conditions. As long as the cache capacity was not reached, the cache would avoid maintaining an eviction queue (reordering entries depending with least/most recently read). When the cache capacity was reached, the LRU queue was updated for each read/write operation. This decreased performance significantly under contention when the capacity was reached. This commit completely rewrites the internals of `ConcurrentLruCache`. `ConcurrentLruCache` is now a specialized version of the `ConcurrentLinkedHashMap` [1]. This change focuses on buferring read and write operations, only processing them at certain times to avoid contention. When a cached entry is read, a read operation is queued and buffered operations are drained if the buffer reached a fixed limit. When a new cache entry is added or removed, a write operation is queued and triggers a drain attempt. When the capacity is outgrown, the cache polls items from the eviction queue, which maintains elements with the least recently used ones first. Entries are removed until the capacity is under control. The behavior described here and the buffer sizes are optimized with the number of available processors in mind. Work is localized as much as possible on a per-thread basis to avoid contention on the eviction queue. The new implementation has been tested with the JMH benchmark provided here, comparing the former `COncurrentLruCache`, the new implementation as well as the `ConcurrentLinkedHashMap` [1]. When testing with a cache reaching capacity, under contention, with a 10% cache miss, we're seeing a 40x improvement compared to the previous implementation and performance on par with the reference. See [2] for how to replicate the benchmark. [1] https://github.com/ben-manes/concurrentlinkedhashmap [2] https://github.com/spring-projects/spring-framework/wiki/Micro-Benchmarks Closes spring-projectsgh-26320
sbrannen
added a commit
that referenced
this issue
Sep 27, 2023
Since the rewrite of ConcurrentLruCache in Spring Framework 6.0, an attempt to create a ConcurrentLruCache with zero capacity results in an IllegalArgumentException even though the documentation states that zero capacity indicates "no caching, always generating a new value". This commit restores the ability to configure a ConcurrentLruCache with zero capacity and introduces corresponding tests (which were first verified against the 5.3.x branch to ensure backward compatibility). See gh-26320 Closes gh-31317
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
in: core
Issues in core modules (aop, beans, core, context, expression)
type: enhancement
A general enhancement
ConcurrentLruCache was extracted from
MimeTypeUtils
for use inNamedParameterJdbcTemplate
to improve its concurrent performance (#24197). Unfortunately, the original logic using a synchronized, boundedLinkedHashMap
is faster than this new one. If the cache reaches capacity then this change introduced in 5.3.0 only exacerbates the problem.ConcurrentLruCache
is implemented as a combination of a read/write lock,ConcurrentHashMap
, andConcurrentLinkedDeque
. If the cache is under capacity then it performs only the map read, which makes it incredibly fast. However once it has reached capacity then it maintains the LRU eviction order by removing and adding the key to the queue, which provides access order. Thus, the tail is the MRU key and searching is an O(n) scan. This is performed under the read lock for exclusive access on a cache write, which would evict an entry. I am not sure if the read/write lock is necessary and it might be replaced with an exclusive lock on the write path only though perhaps was added only for pragmatic concerns as a failsafe.This design has many problems that leads to poor performance:
Caches typically follow a Zipfian distribution (power-law), hence hot and cold entries. In a JMH benchmark with that distribution and a full cache, I observed ~270,000 reads/s with 8 threads on a 4-core SMT laptop. In comparison, a synchronized LinkedHashMap in LRU order performs at 10-11M reads/s, a 40x speedup. If #24197 is an observed bottleneck, then that problem still exists and was likely made worse.
The approach that I use in my caches (ConcurrentLinkedHashMap, Guava, Caffeine) is to amortize the lock penalty. Instead of trying to make LRU thread-safe, the work is enqueued into a buffer and replayed under the lock in batches. This allows readers to be penalized for the cost of appending to the buffer, where multiple buffers spreads out these writes to avoid contention. The replay is a simple tryLock that, if acquired, drains the buffer to apply the cheap LRU reordering operations. That approach can easily raise the throughput to 100M+ reads/s.
For a simple cache to embed, you might consider including ConcurrentLinkedHashMap. That was used broadly during its lifetime and small enough to often be copied elsewhere. Groovy enhanced their copy to support Java 8 compute methods, so that may be an attractive option, which was subsequently embedded by micronaut for similar use-cases. I can't speak to the Groovy variant, but the original has proven to be robust and easily hackable so there should be a low maintenance burden if either is used. Embedding avoids the dependency and a lot of projects, like mssql-jdbc, seem to have preferred doing that.
The text was updated successfully, but these errors were encountered: