Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit ConcurrentLruCache implementation #26320

Closed
ben-manes opened this issue Dec 27, 2020 · 1 comment
Closed

Revisit ConcurrentLruCache implementation #26320

ben-manes opened this issue Dec 27, 2020 · 1 comment
Assignees
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement
Milestone

Comments

@ben-manes
Copy link
Contributor

ConcurrentLruCache was extracted from MimeTypeUtils for use in NamedParameterJdbcTemplate to improve its concurrent performance (#24197). Unfortunately, the original logic using a synchronized, bounded LinkedHashMap is faster than this new one. If the cache reaches capacity then this change introduced in 5.3.0 only exacerbates the problem.

ConcurrentLruCache is implemented as a combination of a read/write lock, ConcurrentHashMap, and ConcurrentLinkedDeque. If the cache is under capacity then it performs only the map read, which makes it incredibly fast. However once it has reached capacity then it maintains the LRU eviction order by removing and adding the key to the queue, which provides access order. Thus, the tail is the MRU key and searching is an O(n) scan. This is performed under the read lock for exclusive access on a cache write, which would evict an entry. I am not sure if the read/write lock is necessary and it might be replaced with an exclusive lock on the write path only though perhaps was added only for pragmatic concerns as a failsafe.

This design has many problems that leads to poor performance:

  1. The O(n) scan of the queue assumes a small cache if being accessed frequently.
  2. A read/write lock is more expensive than an exclusive lock, often built on top of one, because it has to manage more complex state. This type of lock is reasonable when the critical section is very expensive, e.g. I/O, but is a poor choice for short, fast critical sections.
  3. As LRU reorders the key to the tail of the queue on every access, that becomes a single point of contention. This means that fundamentally the performance will be limited by how fast entries can be appended to the tail, and is made worse with contention and bus traffic when using a lock-free algorithm.

Caches typically follow a Zipfian distribution (power-law), hence hot and cold entries. In a JMH benchmark with that distribution and a full cache, I observed ~270,000 reads/s with 8 threads on a 4-core SMT laptop. In comparison, a synchronized LinkedHashMap in LRU order performs at 10-11M reads/s, a 40x speedup. If #24197 is an observed bottleneck, then that problem still exists and was likely made worse.

The approach that I use in my caches (ConcurrentLinkedHashMap, Guava, Caffeine) is to amortize the lock penalty. Instead of trying to make LRU thread-safe, the work is enqueued into a buffer and replayed under the lock in batches. This allows readers to be penalized for the cost of appending to the buffer, where multiple buffers spreads out these writes to avoid contention. The replay is a simple tryLock that, if acquired, drains the buffer to apply the cheap LRU reordering operations. That approach can easily raise the throughput to 100M+ reads/s.

For a simple cache to embed, you might consider including ConcurrentLinkedHashMap. That was used broadly during its lifetime and small enough to often be copied elsewhere. Groovy enhanced their copy to support Java 8 compute methods, so that may be an attractive option, which was subsequently embedded by micronaut for similar use-cases. I can't speak to the Groovy variant, but the original has proven to be robust and easily hackable so there should be a low maintenance burden if either is used. Embedding avoids the dependency and a lot of projects, like mssql-jdbc, seem to have preferred doing that.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Dec 27, 2020
@jhoeller jhoeller self-assigned this Dec 28, 2020
@bclozel bclozel self-assigned this Jan 4, 2021
@bclozel bclozel added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Jun 1, 2021
@bclozel bclozel added this to the 6.0 M1 milestone Jun 1, 2021
@bclozel bclozel changed the title ConcurrentLruCache has poor performance Rework ConcurrentLruCache implementation Sep 20, 2021
@bclozel bclozel changed the title Rework ConcurrentLruCache implementation Revisit ConcurrentLruCache implementation Sep 20, 2021
@bclozel bclozel modified the milestones: 6.0 M1, 6.0 M2 Oct 18, 2021
@jhoeller jhoeller modified the milestones: 6.0 M2, 6.0.x Nov 16, 2021
@ratacolita
Copy link

ratacolita commented Nov 18, 2021

I noticed the performance issue running stress test on a SSE application. A fine workaround that worked for me was to include a custom message converter and override the addDefaultHeaders method to always include the same expected content type header.
That avoids the parsing or hitting this contensious cache altogher.

@rstoyanchev rstoyanchev added the in: core Issues in core modules (aop, beans, core, context, expression) label Nov 24, 2021
bclozel added a commit to bclozel/spring-framework that referenced this issue Aug 22, 2022
Prior to this commit, the `ConcurrentLruCache` implementation would not
perform well under certain conditions. As long as the cache capacity was
not reached, the cache would avoid maintaining an eviction queue
(reordering entries depending with least/most recently read). When the
cache capacity was reached, the LRU queue was updated for each
read/write operation. This decreased performance significantly under
contention when the capacity was reached.

This commit completely rewrites the internals of `ConcurrentLruCache`.
`ConcurrentLruCache` is now a specialized version of the
`ConcurrentLinkedHashMap` [1]. This change focuses on buferring read and
write operations, only processing them at certain times to avoid
contention.

When a cached entry is read, a read operation is queued and buffered
operations are drained if the buffer reached a fixed limit. When a new
cache entry is added or removed, a write operation is queued and
triggers a drain attempt. When the capacity is outgrown, the cache polls
items from the eviction queue, which maintains elements with the
least recently used ones first. Entries are removed until the capacity
is under control.

The behavior described here and the buffer sizes are optimized with the
number of available processors in mind. Work is localized as much as
possible on a per-thread basis to avoid contention on the eviction queue.

The new implementation has been tested with the JMH benchmark provided
here, comparing the former `COncurrentLruCache`, the new implementation
as well as the `ConcurrentLinkedHashMap` [1].

When testing with a cache reaching capacity, under contention, with a
10% cache miss, we're seeing a 40x improvement compared to the previous
implementation and performance on par with the reference.
See [2] for how to replicate the benchmark.

[1] https://github.com/ben-manes/concurrentlinkedhashmap
[2] https://github.com/spring-projects/spring-framework/wiki/Micro-Benchmarks

Closes spring-projectsgh-26320
@bclozel bclozel modified the milestones: 6.0.x, 6.0.0-M6 Aug 31, 2022
bclozel added a commit that referenced this issue Sep 2, 2022
sbrannen added a commit that referenced this issue Sep 27, 2023
Since the rewrite of ConcurrentLruCache in Spring Framework 6.0, an
attempt to create a ConcurrentLruCache with zero capacity results in an
IllegalArgumentException even though the documentation states that zero
capacity indicates "no caching, always generating a new value".

This commit restores the ability to configure a ConcurrentLruCache with
zero capacity and introduces corresponding tests (which were first
verified against the 5.3.x branch to ensure backward compatibility).

See gh-26320
Closes gh-31317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: core Issues in core modules (aop, beans, core, context, expression) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

6 participants