Document that `RedisConnection` is not Thread-safe #2653

zhaozhiguang · 2023-07-27T09:02:03Z

Current Behavior

Use parallelStream for concurrent operations. The amount of data is about 300k. After execution, all results will be added to the pipe. However, this pipe is of the ArrayList type when initializing, so some results are not added to the pipe. Then, NullPointerException will be thrown when traversing the pipe when closePipeline()

Stack trace

java.lang.NullPointerException:null
 at org.springframework.data.redis.connection.lettuce.LettuceConnection.closePipeline(LettuceConnection.java:515)
...

Input Code

redisTemplate.executePipeLined(connection -> {
	objectives.parallelStream().forEach(obj -> {
		connection.setCommands().sAdd(rawKey,rawValue);
	});
})

Environment

Spring-Data-Redis version(s): [e.g. 2.5.6]
Redis version: [e.g. 6.2.5]

jxblum · 2023-07-28T07:57:03Z

@zhaozhiguang -

Perhaps you can explain what objectives is from your "Input Code":

redisTemplate.executePipelined(connection -> {
	objectives.parallelStream().forEach(obj -> {
		connection.setCommands().sAdd(rawKey, rawValue);
	});
})

For my explanation, I assume (and I am reasonably certain) that:

objectives is some type of Collection
And, you are calling parallelStream() on this Collection (Javadoc)
And, the Collection implementation supports parallel processing of the Stream returned by the Collection that is also the source for the elements contained in the Stream.

First of all, the Spring Data Redis RedisConnection interface is not documented as being Thread-safe, not in reference documentation and not in Javadoc.

WARNING: Even a LettuceConnection is not necessarily Thread-safe despite the native Lettuce connection being generally Thread-safe.

Thread-safety is certainly not the case for Jedis. Under-the-hood, a Spring Data Redis JedisConnection (Javadoc; source) uses a Jedis driver client, Jedis instance, which is not Thread-safe (see here).

For Lettuce, while the underlying, native connection is Thread-safe, as documented in Spring Data Redis's reference documentation as well as the Lettuce driver docs (see RedisClient and StatefulConnection, such as StatefulRedisConnection), it is not necessarily true that a Spring Data Redis LettuceConnection itself is Thread-safe.

NOTE: It should be noted that the general Lettuce StatefulConnection interface is not documented as Thread-safe.

NOTE: The List (implementation) declared for pipelining (here, and specifically constructed, here, when opening the pipeline) is not the only non-Thread-safe data structure used by LettuceConnection. LettuceConnection also uses a non-Thread-safe Queue (backed by a LinkedList implementation) for collecting TX results (see here).

When a Java stream is parallelized, such as by calling parallelStream(), the stream is partitioned into smaller sub-streams that are then processed in parallel (concurrently) using multiple Threads created and managed by the ForkJoin framework (using a Thread pool).

Therefore, a LettuceConnection is being accessed from multiple Threads based on how you structured your code.

TIP: It is important to understand the implications of "parallel" streams (and parallel processing in general). You can find documentation on this matter from Oracle, as well as in this nice blog post from Baeldung.

As a temporary workaround, you could have simply, instead, sequentially processed the elements (objectives) in the Collection by doing:

redisTemplate.executePipelined(connection -> {
	objectives.forEach(obj -> {
		connection.setCommands().sAdd(rawKey, rawValue);
	});
})

You are already benefiting from using "Redis pipelining", as documented, anyway.

Alternatively, although I am not entirely certain in this case, you may be able to restructure your code by doing:

objectives.parallelStream().forEach(obj -> {
        redisTemplate.executePipelined(operations -> {
		operations.opsForSet().add(rawKey, rawValue);
	});
})

In this case, you are calling RedisTemplate.executePipelined(:SessionCallback<?>) (Javadoc).

Also, see the Javadoc for SessionCallback.

Of course, SessionCallback seems to be intended for use in transactions. Still, like transactions, pipelining works against a dedicated connection to send a batch of Redis commands in a single request (without the atomic guarantee in TX).

NOTE: RedisTemplate is Thread-safe (see Javadoc, and ref doc).

In either arrangement, I would strongly encourage the use of pooled connections.

Let me know what you think.

In the meantime, I will speak to the team about our Thread-safety guarantees with regard to Spring Data Redis connections (i.e. instances of RedisConnection).

jxblum · 2023-07-28T08:03:06Z

@zhaozhiguang -

Also be aware that Spring Data Redis 2.5.x (last version was 2.5.12) is no longer in OSS support (see here). Even commercial support ends in a couple weeks, on 2023-August-14.

You should minimally upgrade to Spring Data Redis 2.7.x (latest version is 2.7.14). Although, be aware that SD Redis 2.7.x OSS support ends in November, 2023, after which, your team will need to purchase a commercial support from VMware by purchasing a VMware Spring Runtime (VSR; see here) license.

I would encourage you to upgrade to Spring Data Redis 3.1.x as your earliest convenience. Spring Data Redis 3.x does require a Java 17 baseline, uses Spring Framework 6, and is pulled in by Spring Boot 3.

jxblum · 2023-08-01T21:19:02Z

NOTE: I edited my original, first comment above.

jxblum · 2023-08-01T22:20:07Z

@zhaozhiguang - Following up to my previous comments, I have additional feedback for you.

First, I wrote integration tests to reproduce and analyze the problem. A few things should stand out here:

The storeInRedisUsingParallelStreamInPipeline() test case is precisely your scenario.
I could only reproduce the NPE after using a data set size of 500,000 elements, running Oracle JDK 17.0.7 in macOS Ventura 13.5 on Intel.
The test case both passed and failed (throwing the NPE, as documented) across several, different integration test runs. It does not consistently fail or pass, though it passed more often than it failed.
The failure was not unexpected (nor was a "pass" outcome) given the nature of "race conditions" when using non-Thread-safe components, such as Redis connections, and in particular, the Spring Data Redis LettuceConnection class, in a concurrent, multi-Threaded context.
I compared your arrangement to using a sequential stream for processing (storing) the elements in a Redis Set, in this test case.
The ordering between parallel and sequential test cases in my integration test class was deliberate. I did NOT want the "sequential" test case to run with a warm (JIT optimized) JVM as compared to (possibly) the "parallel" test case.
Next, I added time measurements (and timing for sequential execution) to compare sequential and parallel stream use inside Redis Pipelined connections.
Finally, I logged the Thread names in use when testing parallel stream execution inside Redis Pipelined connections to ensure the Collection supported parallel streams in the first place, and that different Threads were used to process the Stream returned by the Collection.

The output should look similar to (conveniently formatted for this comment):

2023-08-01T14:42:20.145-07:00 DEBUG 57021 --- [           main] .s.d.r.c.t.AbstractRedisIntegrationTests : 
PIPELINE-STREAM THREADS [[
  ForkJoinPool.commonPool-worker-1, 
  ForkJoinPool.commonPool-worker-10, 
  ForkJoinPool.commonPool-worker-11, 
  ForkJoinPool.commonPool-worker-2, 
  ForkJoinPool.commonPool-worker-3,
  ForkJoinPool.commonPool-worker-4, 
  ForkJoinPool.commonPool-worker-5, 
  ForkJoinPool.commonPool-worker-6, 
  ForkJoinPool.commonPool-worker-7, 
  ForkJoinPool.commonPool-worker-8, 
  ForkJoinPool.commonPool-worker-9, 
  main
]]

You may be surprised to learn that the "sequential" execution outperformed the "parallel" execution in my testing. Of course, this is not unexpected. Parallelization (or premature optimization) does not always lead to performance improvements. It depends on the environment and measurements are always required to optimize correctly, if at all.

I'd say, in your case, with only a 300K element data set size, it won't make much of a difference to optimize through the use parallelization. It will also most definitely vary by environment. And, most of the time is likely to be eaten up by network activity, anyway, which can vary from moment to moment depending on the amount of network traffic.

In my test run, although local, the difference between sequential and parallel execution on 500,000 elements was not significant, with sequential execution running slightly faster even though it was run first (possibly void of JIT compilation (optimizations in general)):

Using Parallel Stream: 10,918 ms
Using Sequential Stream: 10,269 ms

Of course, (JVM) optimizations can vary between run invocations as well, thereby affecting measurements. But, on average, there was not a significant, noticeable advantage from 1 approach to the other.

NOTE: To get more accurate and reliable measurements, the use JMH is recommended over arbitrary time measurements using System.currentTimeMillis().

jxblum · 2023-08-01T22:57:34Z

Finally, I'd also add that you should NOT use the alternative approach I suggested, which I also captured in a test case.

This performed very poorly, and when you trace down (into Spring Data Redis), you realize that you are not really using Redis Pipelined connections effectively since the SessionCallback is NOT a true "session" in the sense of coupling a connection to a Thread with ThreadLocal in the same way transaction connections are managed.

zhaozhiguang · 2023-08-02T08:40:26Z

Yes, I have also tested both concurrent and serial modes. Serial performance will be better, because I read Lettuce's official website that Thread safety is, of course, Spring Data Redis is also Thread safety

jxblum · 2023-08-02T19:13:44Z

Not all classes and components in Spring Data Redis are Thread-safe, as I have discussed above.

If Spring Data Redis classes are not explicitly documented as Thread-safe, then it is not safe to assume they are, even if certain underlying classes (e.g. Lettuce driver/library classes) used by Spring Data Redis classes are Thread-safe. If a class or component is not explicitly documented as Thread-safe, then you must assume it is NOT Thread-safe, in fact.

For instance, clearly the RedisConnection classes are NOT Thread-safe. JedisConnection is definitely not Thread-safe because Jedis (single instance used by JedisConnection) itself is not Thread-safe (from Jedis doc).

While RedisTemplate is documented (ref doc, Javadoc) to be Thread-safe, it depends on how it is used.

So long as the RedisTemplate is acquired from the Spring container (i.e. injected into your application components and beans managed by the Spring container), and NOT re-configured, it will remain Thread-safe. It can be safely shared across multiple Threads, even though RedisTemplate uses connections, because it acquires, uses and then releases connections per operation, thereby maintaining Thread-safety (1 reason why connection pooling helps here). If it kept a reference to, and simply reused a single connection, then RedisTemplate would no longer be Thread-safe.

If the template configuration is changed after it is acquired from (or injected by) the Spring container, then it is no longer Thread-safe since it carries non-volatile references to other objects, like RedisSerializers used to de/serialize keys/values, the ClassLoader used on deserialization, and so on (all this; mileage on operation interface delegates used by RedisTemplate varies).

NOTE: Non-volatile and non-atomic references to mutable state is not Thread-safe, and even if the references were volatile/atomic, the objects to which they refer much also be Thread-safe (or effectively immutable) for the template to be Thread-safe.

In summary, the "Thread-safety" guarantee of RedisTemplate hinges on "initialization-safety" and nothing changing after initialization.

The same rules apply for other classes and components inside the Spring Data Redis framework.

jxblum · 2023-08-02T19:42:31Z

Two things. In general...

Not every class that consists entirely of or uses Thread-safe classes is itself Thread-safe. For example, think "compound actions" on multiple, independent, Thread-safe objects.
And, not every class consisting of or using non-Thread-safe classes is itself, not Thread-safe. For example, think of classes composed of immutable, or effectively immutable classes or components that are safely initialized.

There are many factors for determining the Thread-safety guarantees and what constitutes a "Thread-safe" class.

Having said this, I really believe this issue boils down to a documentation change for Spring Data Redis, more than anything else.

jxblum · 2023-08-02T20:35:58Z

As of right now, we are currently undecided whether changing the List type (e.g. ArrayList -> CopyOnWriteArrayList or a ConcurrentSkipListSet) in LettuceConnection is appropriate. In general, we don't want users to automatically assume SD Redis connections (or other classes) are necessarily Thread-safe, even if they use Thread-safe classes (or libraries, like Lettuce) under-the-hood.

CC'ing/Pinging @christophstrobl & @mp911de for further comments.

… Thread-safety guarantees. Closes spring-projects#2653

… Thread-safety guarantees. Closes #2653 Original pull request: #2667

Tweak wording around thread-safety. Include notices in connection factories. Reformat asciidoc. See #2653 Original pull request: #2667

… Thread-safety guarantees. Closes #2653 Original pull request: #2667

Tweak wording around thread-safety. Include notices in connection factories. Reformat asciidoc. See #2653 Original pull request: #2667

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jul 27, 2023

jxblum added for: team-attention An issue we need to discuss as a team to make progress in: lettuce Lettuce driver labels Jul 28, 2023

jxblum changed the title ~~pipe mode has thread-safety bug~~ pipeline mode has thread-safety bug Jul 28, 2023

gregturn assigned jxblum Jul 31, 2023

jxblum changed the title ~~pipeline mode has thread-safety bug~~ Pipeline mode has Thread-safety bug Aug 8, 2023

jxblum added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged for: team-attention An issue we need to discuss as a team to make progress labels Aug 8, 2023

jxblum added a commit to jxblum/spring-data-redis that referenced this issue Aug 8, 2023

Edit ref docs and Javadoc for connection classes to clearly state the…

0c35b16

… Thread-safety guarantees. Closes spring-projects#2653

jxblum mentioned this issue Aug 8, 2023

Edit ref doc and Javadoc for connection classes to clearly state the Thread-safety guarantees #2667

Closed

mp911de pushed a commit that referenced this issue Aug 10, 2023

Edit ref docs and Javadoc for connection classes to clearly state the…

48b765d

… Thread-safety guarantees. Closes #2653 Original pull request: #2667

mp911de added a commit that referenced this issue Aug 10, 2023

Polishing.

7d306ed

Tweak wording around thread-safety. Include notices in connection factories. Reformat asciidoc. See #2653 Original pull request: #2667

mp911de closed this as completed in 4bf1d06 Aug 10, 2023

mp911de added a commit that referenced this issue Aug 10, 2023

Polishing.

76da57e

Tweak wording around thread-safety. Include notices in connection factories. Reformat asciidoc. See #2653 Original pull request: #2667

mp911de pushed a commit that referenced this issue Aug 10, 2023

Edit ref docs and Javadoc for connection classes to clearly state the…

91affcb

… Thread-safety guarantees. Closes #2653 Original pull request: #2667

mp911de added a commit that referenced this issue Aug 10, 2023

Polishing.

05ae96f

Tweak wording around thread-safety. Include notices in connection factories. Reformat asciidoc. See #2653 Original pull request: #2667

mp911de linked a pull request Aug 10, 2023 that will close this issue

Edit ref doc and Javadoc for connection classes to clearly state the Thread-safety guarantees #2667

Closed

mp911de added this to the 3.0.9 (2022.0.9) milestone Aug 10, 2023

mp911de added type: documentation A documentation update and removed status: declined A suggestion or change that we don't feel we should currently apply labels Aug 10, 2023

mp911de changed the title ~~Pipeline mode has Thread-safety bug~~ Document that RedisConnection is not Thread-safe Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document that `RedisConnection` is not Thread-safe #2653

Document that `RedisConnection` is not Thread-safe #2653

zhaozhiguang commented Jul 27, 2023

jxblum commented Jul 28, 2023 •

edited

Loading

jxblum commented Jul 28, 2023

jxblum commented Aug 1, 2023

jxblum commented Aug 1, 2023 •

edited

Loading

jxblum commented Aug 1, 2023 •

edited

Loading

zhaozhiguang commented Aug 2, 2023

jxblum commented Aug 2, 2023 •

edited

Loading

jxblum commented Aug 2, 2023 •

edited

Loading

jxblum commented Aug 2, 2023 •

edited

Loading

Document that RedisConnection is not Thread-safe #2653

Document that RedisConnection is not Thread-safe #2653

Comments

zhaozhiguang commented Jul 27, 2023

Current Behavior

Input Code

Environment

jxblum commented Jul 28, 2023 • edited Loading

jxblum commented Jul 28, 2023

jxblum commented Aug 1, 2023

jxblum commented Aug 1, 2023 • edited Loading

jxblum commented Aug 1, 2023 • edited Loading

zhaozhiguang commented Aug 2, 2023

jxblum commented Aug 2, 2023 • edited Loading

jxblum commented Aug 2, 2023 • edited Loading

jxblum commented Aug 2, 2023 • edited Loading

Document that `RedisConnection` is not Thread-safe #2653

Document that `RedisConnection` is not Thread-safe #2653

jxblum commented Jul 28, 2023 •

edited

Loading

jxblum commented Aug 1, 2023 •

edited

Loading

jxblum commented Aug 1, 2023 •

edited

Loading

jxblum commented Aug 2, 2023 •

edited

Loading

jxblum commented Aug 2, 2023 •

edited

Loading

jxblum commented Aug 2, 2023 •

edited

Loading