Revisit ConcurrentLruCache implementation #26320

ben-manes · 2020-12-27T22:09:44Z

ConcurrentLruCache was extracted from MimeTypeUtils for use in NamedParameterJdbcTemplate to improve its concurrent performance (#24197). Unfortunately, the original logic using a synchronized, bounded LinkedHashMap is faster than this new one. If the cache reaches capacity then this change introduced in 5.3.0 only exacerbates the problem.

ConcurrentLruCache is implemented as a combination of a read/write lock, ConcurrentHashMap, and ConcurrentLinkedDeque. If the cache is under capacity then it performs only the map read, which makes it incredibly fast. However once it has reached capacity then it maintains the LRU eviction order by removing and adding the key to the queue, which provides access order. Thus, the tail is the MRU key and searching is an O(n) scan. This is performed under the read lock for exclusive access on a cache write, which would evict an entry. I am not sure if the read/write lock is necessary and it might be replaced with an exclusive lock on the write path only though perhaps was added only for pragmatic concerns as a failsafe.

This design has many problems that leads to poor performance:

The O(n) scan of the queue assumes a small cache if being accessed frequently.
A read/write lock is more expensive than an exclusive lock, often built on top of one, because it has to manage more complex state. This type of lock is reasonable when the critical section is very expensive, e.g. I/O, but is a poor choice for short, fast critical sections.
As LRU reorders the key to the tail of the queue on every access, that becomes a single point of contention. This means that fundamentally the performance will be limited by how fast entries can be appended to the tail, and is made worse with contention and bus traffic when using a lock-free algorithm.

Caches typically follow a Zipfian distribution (power-law), hence hot and cold entries. In a JMH benchmark with that distribution and a full cache, I observed ~270,000 reads/s with 8 threads on a 4-core SMT laptop. In comparison, a synchronized LinkedHashMap in LRU order performs at 10-11M reads/s, a 40x speedup. If #24197 is an observed bottleneck, then that problem still exists and was likely made worse.

The approach that I use in my caches (ConcurrentLinkedHashMap, Guava, Caffeine) is to amortize the lock penalty. Instead of trying to make LRU thread-safe, the work is enqueued into a buffer and replayed under the lock in batches. This allows readers to be penalized for the cost of appending to the buffer, where multiple buffers spreads out these writes to avoid contention. The replay is a simple tryLock that, if acquired, drains the buffer to apply the cheap LRU reordering operations. That approach can easily raise the throughput to 100M+ reads/s.

For a simple cache to embed, you might consider including ConcurrentLinkedHashMap. That was used broadly during its lifetime and small enough to often be copied elsewhere. Groovy enhanced their copy to support Java 8 compute methods, so that may be an attractive option, which was subsequently embedded by micronaut for similar use-cases. I can't speak to the Groovy variant, but the original has proven to be robust and easily hackable so there should be a low maintenance burden if either is used. Embedding avoids the dependency and a lot of projects, like mssql-jdbc, seem to have preferred doing that.

The text was updated successfully, but these errors were encountered:

ratacolita · 2021-11-18T17:12:20Z

I noticed the performance issue running stress test on a SSE application. A fine workaround that worked for me was to include a custom message converter and override the addDefaultHeaders method to always include the same expected content type header.
That avoids the parsing or hitting this contensious cache altogher.

Prior to this commit, the `ConcurrentLruCache` implementation would not perform well under certain conditions. As long as the cache capacity was not reached, the cache would avoid maintaining an eviction queue (reordering entries depending with least/most recently read). When the cache capacity was reached, the LRU queue was updated for each read/write operation. This decreased performance significantly under contention when the capacity was reached. This commit completely rewrites the internals of `ConcurrentLruCache`. `ConcurrentLruCache` is now a specialized version of the `ConcurrentLinkedHashMap` [1]. This change focuses on buferring read and write operations, only processing them at certain times to avoid contention. When a cached entry is read, a read operation is queued and buffered operations are drained if the buffer reached a fixed limit. When a new cache entry is added or removed, a write operation is queued and triggers a drain attempt. When the capacity is outgrown, the cache polls items from the eviction queue, which maintains elements with the least recently used ones first. Entries are removed until the capacity is under control. The behavior described here and the buffer sizes are optimized with the number of available processors in mind. Work is localized as much as possible on a per-thread basis to avoid contention on the eviction queue. The new implementation has been tested with the JMH benchmark provided here, comparing the former `COncurrentLruCache`, the new implementation as well as the `ConcurrentLinkedHashMap` [1]. When testing with a cache reaching capacity, under contention, with a 10% cache miss, we're seeing a 40x improvement compared to the previous implementation and performance on par with the reference. See [2] for how to replicate the benchmark. [1] https://github.com/ben-manes/concurrentlinkedhashmap [2] https://github.com/spring-projects/spring-framework/wiki/Micro-Benchmarks Closes spring-projectsgh-26320

See gh-26320

Since the rewrite of ConcurrentLruCache in Spring Framework 6.0, an attempt to create a ConcurrentLruCache with zero capacity results in an IllegalArgumentException even though the documentation states that zero capacity indicates "no caching, always generating a new value". This commit restores the ability to configure a ConcurrentLruCache with zero capacity and introduces corresponding tests (which were first verified against the 5.3.x branch to ensure backward compatibility). See gh-26320 Closes gh-31317

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Dec 27, 2020

jhoeller self-assigned this Dec 28, 2020

bclozel self-assigned this Jan 4, 2021

ben-manes mentioned this issue Mar 19, 2021

Possible JDK16 issue in 3.0.0: ThreadLocal not initialised? ben-manes/caffeine#515

Closed

bclozel added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Jun 1, 2021

bclozel added this to the 6.0 M1 milestone Jun 1, 2021

diguage mentioned this issue Sep 14, 2021

重点 Issue or PR diguage/spring-framework#3

Open

bclozel changed the title ~~ConcurrentLruCache has poor performance~~ Rework ConcurrentLruCache implementation Sep 20, 2021

bclozel changed the title ~~Rework ConcurrentLruCache implementation~~ Revisit ConcurrentLruCache implementation Sep 20, 2021

bclozel modified the milestones: 6.0 M1, 6.0 M2 Oct 18, 2021

jhoeller modified the milestones: 6.0 M2, 6.0.x Nov 16, 2021

rstoyanchev added the in: core Issues in core modules (aop, beans, core, context, expression) label Nov 24, 2021

bclozel mentioned this issue Jul 15, 2022

Add ConcurrentLruMap. awspring/spring-cloud-aws#456

Closed

bclozel modified the milestones: 6.0.x, 6.0.0-M6 Aug 31, 2022

bclozel closed this as completed in c470262 Aug 31, 2022

bclozel added a commit that referenced this issue Sep 2, 2022

Polish

fc47d25

See gh-26320

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisit ConcurrentLruCache implementation #26320

Revisit ConcurrentLruCache implementation #26320

ben-manes commented Dec 27, 2020

ratacolita commented Nov 18, 2021 •

edited

Loading

Revisit ConcurrentLruCache implementation #26320

Revisit ConcurrentLruCache implementation #26320

Comments

ben-manes commented Dec 27, 2020

ratacolita commented Nov 18, 2021 • edited Loading

ratacolita commented Nov 18, 2021 •

edited

Loading