-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Java] Arrow Flight C++/Java performance comparison #13980
Comments
The benchmarks are nowhere near comparable. The C++ version spawns 16 threads, as you can see in the output. The Java version spawns one thread per Endpoint, and defaults to two endpoints. The Java version uses smaller batches (4095 * 4 * 8 ~= 128 KiB) whereas the C++ one uses ~256 KiB. The Java version also appears to suffer from JVM warmup. The Java thread also isn't clear, but it appears to be summing the two threads together. It does seem the per-thread performance is not quite as good, but there's so many differences between the two benchmarks that I wouldn't take this as anything remotely definitive. |
Ah, I missed this initially, but did you modify the benchmarks already? |
@lidavidm yes, I adjusted Java version to have same parameters as C++. They should behave more or less similar:) |
Can you share all modifications? |
Sure, here you can find the changes in |
I would try removing the backpressure wait and seeing how that helps: arrow/java/flight/flight-core/src/test/java/org/apache/arrow/flight/perf/PerformanceTestServer.java Line 144 in b832853
(this will require a lot of memory, you may have to manually add sleep() calls to space out the batches a bit) the issue with gRPC Java is that it has a fixed, small buffer it uses for backpressure, so effectively every write will trigger backpressure and artificially throttle the producer, regardless of actual network conditions. This has been known for years but the upstream is not interested in fixing it: grpc/grpc-java#5433 Unfortunately without it, Java never applies any backpressure and instead you tend to just OOM. I believe C++ does the 'right' thing and automatically applies backpressure (by blocking the send call) above some threshold which I do not recall, but which I think is actually based on network conditions. |
Or you can build gRPC yourself with that threshold modified as a test (e.g. setting it to some megabytes). |
yes, we also found the same issue and tried it without backpressure, but still no changes in performance(not in the benchmark, nor in real code). |
In that case I will have to build and profile it, unless you have tried this already too? |
We already tried to profile Flight in production service. After applying some fixes to |
Ah, the final thing I would suggest (unless you've already done this?), at least for this benchmark, is to create a new flight client for each stream (since otherwise all the clients share a single TCP connection). This is what C++ does. But that doesn't seem like it would apply for the production service. What are the fixes? I thought we had optimizations for that (but it has been a while since we last looked at it and something may have been overlooked/changed) |
Oh, yes, I forgot to try it, will do
You can take a look here. |
I got this result after this change: |
Thanks for checking. I filed https://issues.apache.org/jira/browse/ARROW-17537 and will look at these things. I'm not sure what is left, at this point it seems like an issue of gRPC implementations, but possibly there are other things (e.g. Netty can optionally use JNI code and epoll instead of relying on JVM facilities, but this may not necessarily actually be an improvement). |
Yes, we also tried Epoll. Didn't observe any changes. |
Ok, thanks. Sorry, this is still on my backlog to take a deeper look at, but I probably won't have time before Arrow 10.0.0 (mid-late October) :/ |
Ok, I understand👍 |
Worth mentioning that grpc/grpc-java#5433 is fixed now, and the changes have been integrated into Flight in #41051 |
Hi
I'm trying to understand difference in performance of Arrow Flight for C++ and Java. I run benchmarks for both languages and got the ~20x throughput difference.
Benchmark parameters:
16/16
8192
10_000_000
localhost
Hardware:
CPU: 2x Intel(R) Xeon(R) Platinum 8352Y CPU @ 2.20GHz
RAM: 512GB
C++(
arrow/flight/flight_benchmark.cc
)Java(
flight-core/src/test/java/org/apache/arrow/flight/perf/TestPerf.java
)Is it expected throughput difference? Can Java Flight server somehow be tuned in order to be more close to C++ version?
The text was updated successfully, but these errors were encountered: