Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Proposal] Virtual Thread Support for heavy I/O bound ops #10786

Open
nknize opened this issue Oct 20, 2023 · 8 comments
Open

[Performance Proposal] Virtual Thread Support for heavy I/O bound ops #10786

nknize opened this issue Oct 20, 2023 · 8 comments
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs

Comments

@nknize
Copy link
Collaborator

nknize commented Oct 20, 2023

JDK19 released JEP-425 (Virtual Threads) as an --enable-preview feature which improves thread scalability "...by mapping a large number of virtual threads to a small number of OS threads.". The feature is ideal for heavy I/O bound operations eliminating the need for the thread-sharing asynchronous programming style where a heavy I/O bound thread is returned to the pool and only signals its completion as a callback to the calling thread. This async design is all over the OpenSearch codebase.

This enhancement proposal is to explore performance improvements (or code simplification) by using virtual threads on those heavy I/O bound operations (e.g., segment copies). One option is to initially Feature Flag virtual thread support as an option to ScalingExecutorBuilder that can be enabled by the user as a cluster setting. Once proved out through benchmarks we can turn vThreads on by default for runtimes using JDK 19+ (or just bump the min targetCompatibility and sourceCompatibility to JavaVersion.VERSION_19 to guarantee feature availability).

Note this is NOT suggesting using virtual threads for optimizing Lucene IW / IR workloads (as mentioned here) but to explore using them in the distributed design around the lucene internals (e.g., storage transfers)

@andrross
Copy link
Member

...and only signals its completion as a callback to the calling thread. This async design is all over the OpenSearch codebase.

I think this async pattern is one of the main causes of complexity of the OpenSearch code base and is likely a big hurdle for new folks to get over when contributing to the core. Virtual threads are super intriguing to me as a way to simplify the code and make it easier to configure to optimally use resources.

The newly introduced functionality to download segments from the remote store during replication and recovery seems like a good place to experiment with virtual threads. It is pretty much all I/O bound.

@ansjcy
Copy link
Member

ansjcy commented Nov 14, 2023

A side note:

Currently, the Performance Analyzer plugin collects certain JVM performance data in OpenSearch by taking thread dumps and also directly reading from the thread files under procfs (/proc/<opensearch pid>/task/*), we need to involve the PA/RCA repo maintainers in this effort to understand the impact on their plugin as well, otherwise any components depending on PA/RCA metrics would be impacted.

@sohami
Copy link
Collaborator

sohami commented Nov 17, 2023

The newly introduced functionality to download segments from the remote store during replication and recovery seems like a good place to experiment with virtual threads. It is pretty much all I/O bound.

Similar to this Snapshot layer can be another area where it can be useful.

@getsaurabh02 getsaurabh02 moved this from Todo to Now (This Quarter) in Performance Roadmap Nov 20, 2023
@andrross
Copy link
Member

andrross commented Dec 22, 2023

I did some very brief experimenting here, and once I got the compilation issues sorted out I ran into an immediate problem with the security manager. I was just doing some simple file I/O where it would work on a platform thread (using a thread from an executor in org.opensearch.threadpool.ThreadPool) but fails on a virtual thread. I haven't dug into this at all, but I don't think the design of virtual threads plays well with the security manager (which is on the path for removal) as it may require setting up the security context on each new virtual thread.

Update: per JEP444 "Virtual threads have no permissions when running with a SecurityManager set" which I believe is what I encountered here

@reta
Copy link
Collaborator

reta commented Dec 22, 2023

@andrross this could be relevant #1687 (comment)

@andrross
Copy link
Member

@andrross this could be relevant #1687 (comment)

@reta Indeed. Virtual threads run with no permissions. It can be made to work by wrapping the code in AccessController.doPrivileged but that may well partially defeat the purpose of virtual threads, which are intended to be very lightweight.

@Bukhtawar
Copy link
Collaborator

Bukhtawar commented Jan 1, 2024

An important thing to note is, Netty doesn't have a support yet. it would be interesting to see Netty adapt Virtual threads netty/netty#12848 which is used heavily at the transport layer for Network IO.

Some public benchmarking results
https://github.com/ebarlas/project-loom-comparison/blob/main/docs/comparison-plots.png is indeed impressive especially at scale

A note of caution: We need to be careful with using V threads when dealing with native dependencies that have a tendency of playing with memory addresses ad V threads would be swapping stacks on platform threads and would break underlying constructs

@andrross
Copy link
Member

andrross commented Jan 2, 2024

@Bukhtawar My understanding is that under the hood virtual threads end up doing something very similar to Netty (I've found this doc to be helpful) but it is the programming model that is very different. What exactly does it mean for Netty to offer support? My understanding is that you probably would not want to use Netty's asynchronous API along with virtual threads.

To me, one of the most obvious places in OpenSearch to experiment with virtual threads is anything that uses the repository interface (snaphots, remote store) as that interface is mostly synchronous and we might be able to get big performance improvements without having to rewrite the code to be async. The transport layer is fully asynchronous and is likely a much bigger change/refactoring to use virtual threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Performance This is for any performance related enhancements or bugs
Projects
Status: Now (This Quarter)
Development

No branches or pull requests

7 participants