Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe memory leak in Netty 4 transport. #1634

Closed
He-Pin opened this issue Dec 28, 2024 · 11 comments
Closed

Maybe memory leak in Netty 4 transport. #1634

He-Pin opened this issue Dec 28, 2024 · 11 comments
Milestone

Comments

@He-Pin
Copy link
Member

He-Pin commented Dec 28, 2024

refs:

OOM reported: https://issues.apache.org/jira/browse/FLINK-36290

https://issues.apache.org/jira/browse/FLINK-36510
I think there will be some leak, eg, not calling ByteBuf.release somewhere.

Update:
The current code is right.

I think the only way to find where it leaked is using -Dio.netty.leakDetection.level=PARANOID and return the test,
Or they can rerun the tests with -Dio.netty.tryReflectionSetAccessible=true.

@He-Pin He-Pin added bug Something isn't working backport labels Dec 28, 2024
@pjfanning
Copy link
Contributor

The link makes no mention of Pekko. Can you remove the labels (bug and backport) until there is Pekko specific POC for this?

@pjfanning
Copy link
Contributor

We now have #1635 but can we take a little bit of time and care. There is no point rushing this. The change is not really usable until we do a release. Can we slow down the effort and stop opening multiple PRs at the same time?

@pjfanning
Copy link
Contributor

@He-Pin you linked https://issues.apache.org/jira/browse/FLINK-36788 - I don't understand why. The real issue appears to be https://issues.apache.org/jira/browse/FLINK-36510

@He-Pin
Copy link
Member Author

He-Pin commented Dec 28, 2024

@pjfanning Sorry,I wrong linked , have updated to the right link

@He-Pin
Copy link
Member Author

He-Pin commented Dec 28, 2024

Actually, the code is Right:(,

Because it extends the SimpleChannelInboundHandler which will do auto-release.

finally {
            if (autoRelease && release) {
                ReferenceCountUtil.release(msg);
            }
        }

@He-Pin
Copy link
Member Author

He-Pin commented Dec 28, 2024

Do local run with cluster/MultiJvm/test and -Dio.netty.leakDetection.level=PARANOID to find if it leaked.

@He-Pin He-Pin removed bug Something isn't working backport labels Dec 28, 2024
@pjfanning
Copy link
Contributor

Isn't it good that there is no leak?

Can we close this?

Any ideas what might be going wrong in Flink tests? https://issues.apache.org/jira/browse/FLINK-36290

Do we need the exception handling changes that you were trying to make in #1635 ?

@He-Pin
Copy link
Member Author

He-Pin commented Dec 28, 2024

Yes, I run it locally, and no leak, otherwise it will prints something like:

LEAK: {}.release() was not called before it's garbage-collected.

@He-Pin
Copy link
Member Author

He-Pin commented Dec 28, 2024

@pjfanning I think the exception handing is nice, I will add it in another PR.
Iet's close it after #1639 merged and checked it again.

@He-Pin
Copy link
Member Author

He-Pin commented Dec 28, 2024

@pjfanning I think the only way to find where it leaked is using -Dio.netty.leakDetection.level=PARANOID and return the test, which will print the detailed leaked position.

Or they can rerun the tests with -Dio.netty.tryReflectionSetAccessible=true.

The issue seems only on Java 11+, so maybe try with -Dio.netty.tryReflectionSetAccessible=true.

-Dio.netty.allocator.type=unpooled can keep the netty 3 unpooled behavior.

@He-Pin
Copy link
Member Author

He-Pin commented Jan 8, 2025

image image

I think the leak is not on pekko side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants