[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

XComp · 2024-12-29T13:51:08Z

This reverts commit 4776c96.

What is the purpose of the change

Reverts the pekko version bump that includes an upgrade to netty 4.x. Corresponding discussion happened in FLINK-36510.

Brief change log

Plain revert

Verifying this change

no additional verification done aside from CI

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): yes
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable

This reverts commit 4776c96.

flinkbot · 2024-12-29T13:55:38Z

CI report:

e7d2bb2 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

He-Pin · 2024-12-31T03:09:27Z

@ferenc-csaky @XComp Hi, I think I can confirm that there's not a leak on the Pekko side.

He-Pin · 2024-12-31T08:03:03Z

I would like to suggest:

rerun the tests with -Dio.netty.tryReflectionSetAccessible=true and -Dio.netty.leakDetection.level=PARANOID to see why and where it leaks or gets a heap dump.

We have some applications, not a flink application still needs this -Dio.netty.tryReflectionSetAccessible=true to avoid OOM in production.

or , you could set the Netty's bytebuf allocator with unpooled by default.

with -Dio.netty.allocator.type=unpooled

XComp · 2025-01-03T11:30:28Z

@He-Pin are you sure? We see the following stacktrace in this e2e test failure:

Jan 02 06:20:02 org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Cannot reserve 4194304 bytes of direct buffer memory (allocated: 140396831, limit: 141557760) (connection to 'localhost/127.0.0.1:42031 [localhost:45071-b0167d]')
Jan 02 06:20:02 	at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:175) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Jan 02 06:20:02 	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]

Anyway, I might have another look into it on the Flink side as well.

He-Pin · 2025-01-03T11:58:48Z

@XComp So it is better to turn on -Dio.netty.leakDetection.level=PARANOID to see who holds the buffers.
and turn on -Dio.netty.allocator.type=unpooled too, then all ByteBuf lives in memory.

@normanmaurer is there any suggestion, thanks.

He-Pin · 2025-01-03T12:57:09Z

another optimization in apache/pekko#1667

He-Pin · 2025-01-06T06:06:58Z

@XComp Is there any update can share, thanks.

XComp · 2025-01-06T12:01:02Z

@XComp Is there any update can share, thanks.

Currently, I have a lot of other stuff on my plate. I wouldn't mind if somebody else could help pick this up.

He-Pin · 2025-01-06T12:06:23Z

It would be nice if anyone could add :
-Dio.netty.tryReflectionSetAccessible=true
-Dio.netty.leakDetection.level=PARANOID
-Dio.netty.allocator.type=unpooled

to run tests.

ferenc-csaky · 2025-01-06T12:19:20Z

Hi! I'm back from my holiday, so I can take it and try some runs, will update the Jira ticket with any progress.

davidradl · 2025-01-09T15:25:05Z

@ferenc-csaky @He-Pin @XComp It looks like this PR is all about reverting the level of the pekko until we understand the cause of the OOM. All the comments seem to relate to resolving the OOM. Can I suggest we merge this revert, and investigate the OOM separately as suggested in the Jira.

He-Pin · 2025-01-09T16:06:39Z

This is not a leak but a user-side error. This is how Netty works. Will you set it to 7MB in production, @davidradl?
The best Pekko can do is :

add an option to support using UnpooledAllocator to make the 7MB memory tests pass, is that the right way to go?

If you think that's really needed, please send a PR to pekko, which can setup the allocator type of channels.

ferenc-csaky · 2025-01-09T17:33:35Z

@davidradl My understanding is that Netty4 does not leak memory, simply compared to Netty3 by default it does not work the same way and reserve a bit more memory.

But with -Dio.netty.tryReflectionSetAccessible=true the default memory footprint will be smaller, and -Dio.netty.allocator.type=unpooled can also help with that, but my understanding is that performance wise it is not really advised to use unpooled allocation.

@He-Pin 7MB is not realistic in any kind of production use-case, for the failing test it is only set that way, because that test validates how much memory is used by Netty, that's why it sets num-arenas to 1 and mem to 7MB.

My suggestion would be to fix this test instead of revert. Either by giving it more memory, or providing the necessary Netty configs to be able to function with that much memory.

For the sake of completeness, on master commit 338d024 already increased the memory to 90MB, so backporting that commit to 1.20 and 1.19 could fix that test case mentioned here earlier. Although I'm not sure with those changes that test will be meaningful, because num-arenas=1 is also removed in that commit.

My original idea was to set -Dio.netty.tryReflectionSetAccessible=true for this test case, because that way, it fits into 7MB and test execution succeeded multiple times on my local tests for both JDK11 and JDK17.

He-Pin · 2025-01-09T17:41:19Z

@ferenc-csaky Our Java applications (high throughput) run with Java 11 /21 are using -Dio.netty.tryReflectionSetAccessible=true when we upgrade From Java 8, this is really needed to avoid OOM and reduce GC pressure, the old one does not need it because it's Unpooled, which generates much more gc counts and hurt performance.

I vote for adding this by default or adding the -Dio.netty.tryReflectionSetAccessible=true to the release notes, people who want the old behavior can enable it with -Dio.netty.allocator.type=unpooled if they like more gc.

He-Pin · 2025-01-09T17:47:03Z

I implemented the Netty 4-based remoting transport once when I was working for a game company, but the Akka team did not accept that PR for some reason, so we can only use that internally, after years, Pekko fork happens and we have the control of code, So we can do the right thing now, I'm using Netty at $Work too.

We should not simply blindly revert, let's do it in the right way, the CVEs are really annoying.

davidradl

It sounds like the title of the PR is not in line with what we want to in the PR comments. It sounds like there is appetite to fix this properly at the pekko higher version. So I am removing my approve - as this refers to the reversion code which is currently in the PR.

He-Pin · 2025-01-10T11:02:19Z

@davidradl Is there any investigation result update from your side, thanks.

ferenc-csaky · 2025-01-10T15:01:37Z

Opened #25955 which I believe should supersede this current PR.

davidradl · 2025-01-10T16:07:37Z

@ferenc-csaky sounds good - can we close this PR?

ferenc-csaky · 2025-01-10T16:11:45Z

IMO yes, I will close both this one and the 1.19 equivalent on Monday if no objections until then.

XComp · 2025-01-13T08:29:50Z

Sorry for not getting back earlier. Thanks for looking into the issue, @ferenc-csaky . But on a more general note and with the concerns @zentol shared in FLINK-36510:

Shouldn't we merge the revert at least for 1.19 as the stable release (i.e. sticking to the older netty version)?
I see your point with upgrading to netty 4.x for 1.20 as this seems to be the LTS version for Flink 2.x. One other option we have here is doing the upgrade for 1.20.2 and not including the it in 1.20.1 (which @afedulov is preparing right now). That would give us a few more CI release cycles. WDYT?

ferenc-csaky · 2025-01-13T09:54:54Z

@XComp I do not see any problem with your suggested approach for 1.19, it makes sense.

I can also accept to release it with 1.20.2 to the 1.20 line, if everybody else agrees, but personally am a bit more reluctant about that. I guess that release will happen in a couple months after 1.20.1 best case scenario, and surely that will give us more time to see if the CI runs are more stable or not regarding this aspect. But even if not, based on my current investigation probably it will be some other necessary configuration to make, not some serious memory bug. And on the other hand, the pesky Netty3 CVEs won't go anywhere. @afedulov WDYT, any requirements from your side?

afedulov · 2025-01-13T17:45:28Z

I believe that since we are not dealing with a memory leak but rather with different memory allocation (thanks @ferenc-csaky for confirming this!), we should aim to include the upgrade in both the 1.19 and 1.20 releases. While there is some risk of exposing users to OOM kills by pushing some workloads over hard memory limits, we have to weigh them against the risks posed by leaving numerous critical CVEs exposed on the network stack.

Netty 3.10.6 is the last 3.x release and therefore officially reached EOL more than 8 years ago. I also read reports of it suffering from multiple GC and memory management issues, so it is not like we are transitioning away from something that is rock solid and works perfectly to an experimental release.

The approach proposed by @He-Pin in the above comment sounds reasonable to me. If we adopt this approach, I think we should actually add -Dio.netty.tryReflectionSetAccessible=true parameter as default to the startup scripts, not merely to the documentation. An open question remains: should we retain the 3.x-style memory management by using unpooled allocations? My current understanding is that using both parameters should approximate the existing behavior, but it does not seem like unpooled allocation is strictly required for our case. @He-Pin, what’s your perspective on this?

As for fixing it in 1.20.2 or 1.20.1 - I am not convinced that having CI running more times will provide us required confidence. The primary concern lies in breaching memory limits in existing deployments rather than stability. If we agree this change is necessary for a patch release eventually, it would be logical to apply it now for both 1.19.2 and 1.20.1. I will make sure this potential concern is explicitly mentioned in the release notes.

@ferenc-csaky Do we have a rough understanding of how much more memory consumption does this new version induce? Does it look like some fixed amount or something that scales with the number of connections?

He-Pin · 2025-01-13T18:19:34Z

@afedulov Thanks for the ping.

For the Netty4 migration, we can't ship a library with many CVES, so it took me nearly 5 weekends to migrate it from Netty 3 to Netty 4, I have Flink in mind too. the multi-jvm-plugin is migrated from Netty 3 to Netty 4, and we upstream that to Akka, where it was Netty 3 and got merged. !test Migrate akka-multi-node-testkit to Netty4 akka/akka#32005
We did encounter this in production (Spring Boot Application with Netty-based RPC) , after we migrated from Java 8 to Java 11, we encountered OOMs, all be solved with simple: -Dio.netty.tryReflectionSetAccessible=true .
The Netty 4 works differently than it was in Netty 3, mainly for performance and less GC, and that's how it works. Yes, it may be my fault for using the PooledByteBufAllocator in the first place, I may need to keep the behavior with an UnpooledHeapAllocator, But I think that's not the best practice, so I'm not putting that behind a config option too.
You can take a look at the current implementation, I think I'm doing the best practice, But as time flies, It did take me a day to figure out if it really leaks on Pekko side, but after that, I can confirm it's NOT.
After Pekko 1.1.0 ships, we receive nothing about the leak report.

So I think keeping the Netty 3 version seems a little smoother brain, especially with @ferenc-csaky done detailed investigation, Keep it in Netty 3 will expose all downstream the supply chain with CVES, that's not actually right.

But I do suggest we do some long time stress testing about this( eg 1 or 2 days). I looked at some issues inside the Flink, eg parallelism serialization, I think which can be done in the current classical transport too.

In short: Stress testing to make sure it works smoothly and ships the Netty 4 version is my +1.

He-Pin · 2025-01-13T18:32:17Z

BTW, if Flink is supporting JDK 17+ too, please add below when running on JDK 17 or higher

--add-opens java.base/java.nio=ALL-UNNAMED
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED

too.

He-Pin · 2025-01-13T18:39:39Z

@davidradl I think making the old behavior(unpooled) configurable can be done, but that will need @pjfanning to confirm backporting.

And the change is how the PooledByteBufAllocator works, which will cache some arenas, but TBH, 7M is very small for anywork load.

He-Pin · 2025-01-13T20:35:12Z

@XComp @ferenc-csaky @davidradl @afedulov I just prepared a PR apache/pekko#1707 for this, not sure if @pjfanning agree with backporting this to 1.1.4 release.

tomncooper · 2025-01-14T09:32:29Z

+1 for keeping the newer Pekko and Netty 4 in 1.20.1 (and not merging this PR). This is the LTS and my 2c is that having a more secure base and fixing any issues that arise is the better path.

He-Pin · 2025-01-14T10:47:37Z

A backport is pending apache/pekko#1709 as @afedulov requested.

Revert "[FLINK-36510][rpc] Bump Pekko to 1.1.2, remove Netty 3"

e7d2bb2

This reverts commit 4776c96.

XComp changed the title ~~[FLINK-36979][rpc] Reverting pekko version bump~~ [FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 Dec 29, 2024

ferenc-csaky approved these changes Dec 29, 2024

View reviewed changes

afedulov mentioned this pull request Jan 6, 2025

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.19 #25867

Open

davidradl approved these changes Jan 9, 2025

View reviewed changes

He-Pin mentioned this pull request Jan 10, 2025

perf: optmize NettyChannelHandlerAdapter with explict extends. (#1667) apache/pekko#1698

Closed

davidradl approved these changes Jan 10, 2025

View reviewed changes

davidradl suggested changes Jan 10, 2025

View reviewed changes

He-Pin mentioned this pull request Jan 13, 2025

chore: Add support for controlling the NettyTransport's byteBuf allocator type. apache/pekko#1707

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

XComp commented Dec 29, 2024

flinkbot commented Dec 29, 2024 •

edited

Loading

He-Pin commented Dec 31, 2024

He-Pin commented Dec 31, 2024 •

edited

Loading

XComp commented Jan 3, 2025

He-Pin commented Jan 3, 2025

He-Pin commented Jan 3, 2025

He-Pin commented Jan 6, 2025

XComp commented Jan 6, 2025

He-Pin commented Jan 6, 2025

ferenc-csaky commented Jan 6, 2025

davidradl commented Jan 9, 2025

He-Pin commented Jan 9, 2025 •

edited

Loading

ferenc-csaky commented Jan 9, 2025

He-Pin commented Jan 9, 2025

He-Pin commented Jan 9, 2025 •

edited

Loading

davidradl left a comment

He-Pin commented Jan 10, 2025

ferenc-csaky commented Jan 10, 2025

davidradl commented Jan 10, 2025

ferenc-csaky commented Jan 10, 2025

XComp commented Jan 13, 2025

ferenc-csaky commented Jan 13, 2025

afedulov commented Jan 13, 2025 •

edited

Loading

He-Pin commented Jan 13, 2025

He-Pin commented Jan 13, 2025 •

edited

Loading

He-Pin commented Jan 13, 2025

He-Pin commented Jan 13, 2025 •

edited

Loading

tomncooper commented Jan 14, 2025

He-Pin commented Jan 14, 2025

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

Are you sure you want to change the base?

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

Conversation

XComp commented Dec 29, 2024

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Dec 29, 2024 • edited Loading

CI report:

He-Pin commented Dec 31, 2024

He-Pin commented Dec 31, 2024 • edited Loading

XComp commented Jan 3, 2025

He-Pin commented Jan 3, 2025

He-Pin commented Jan 3, 2025

He-Pin commented Jan 6, 2025

XComp commented Jan 6, 2025

He-Pin commented Jan 6, 2025

ferenc-csaky commented Jan 6, 2025

davidradl commented Jan 9, 2025

He-Pin commented Jan 9, 2025 • edited Loading

ferenc-csaky commented Jan 9, 2025

He-Pin commented Jan 9, 2025

He-Pin commented Jan 9, 2025 • edited Loading

davidradl left a comment

Choose a reason for hiding this comment

He-Pin commented Jan 10, 2025

ferenc-csaky commented Jan 10, 2025

davidradl commented Jan 10, 2025

ferenc-csaky commented Jan 10, 2025

XComp commented Jan 13, 2025

ferenc-csaky commented Jan 13, 2025

afedulov commented Jan 13, 2025 • edited Loading

He-Pin commented Jan 13, 2025

He-Pin commented Jan 13, 2025 • edited Loading

He-Pin commented Jan 13, 2025

He-Pin commented Jan 13, 2025 • edited Loading

tomncooper commented Jan 14, 2025

He-Pin commented Jan 14, 2025

flinkbot commented Dec 29, 2024 •

edited

Loading

He-Pin commented Dec 31, 2024 •

edited

Loading

He-Pin commented Jan 9, 2025 •

edited

Loading

He-Pin commented Jan 9, 2025 •

edited

Loading

afedulov commented Jan 13, 2025 •

edited

Loading

He-Pin commented Jan 13, 2025 •

edited

Loading

He-Pin commented Jan 13, 2025 •

edited

Loading