Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: memory leak in cpp_function (#3228) #3229

Merged
merged 8 commits into from
Aug 31, 2021

Conversation

yuantailing
Copy link
Contributor

@yuantailing yuantailing commented Aug 29, 2021

Description

The PR is related to #3228, and may be related to #1510 and pytorch/pytorch#5161.

Let's focus on pybind11::cpp_function::initialize. If the first argument is lvalue reference, the type Func is deduced to lvalue reference (according to Template argument deduction). In this case, std::is_trivially_destructible<Func> is always true, therefore rec->data will never be deconstructed.

To solve this issue, we should determine whether rec->data should be deconstructed according whether the type of rec->data (i.e., capsule) is trivially destructible.

Suggested changelog entry:

A long-standing memory leak in ``py::cpp_function::initialize`` was fixed.

@Skylion007 Skylion007 requested a review from rwgk August 29, 2021 16:05
@Skylion007
Copy link
Collaborator

Very interesting and surprised Valgrind and our test suite isn't catching this.

Do you think it's possible that this is related to this PyBind failing test case as well: #1301? Does that test case pass when added to this PR?

Can the MRE be reduced to a test case to repro the error in a way our test suite can detect?

@Skylion007 Skylion007 requested a review from henryiii August 29, 2021 16:45
@Skylion007 Skylion007 added the bug label Aug 29, 2021
@rwgk
Copy link
Collaborator

rwgk commented Aug 29, 2021

Amazing indeed that this bug survived until now.

This seems to be the commit that introduced the line you're changing:

19208fe

I don't know if it was a bug from the start, or if the bug is the result of subsequent changes.

@yuantailing could you help by converting your #3228 reproducer to a unit test? Maybe an addition to test_callbacks.cpp,py. constructor_stats.h may be useful. But I'm very uncertain about the details, I haven't touched this code myself, and I don't know how the original unit test (see commit above) relates to constructor_stats.h.

@yuantailing
Copy link
Contributor Author

Do you think it's possible that this is related to this PyBind failing test case as well: #1301? Does that test case pass when added to this PR?

I think they are not related. The issue happens when a lvalue reference is passed to pybind11::cpp_function. Test case #1301 tests that a python function can be converted to std::function. They are not the same.

Can the MRE be reduced to a test case to repro the error in a way our test suite can detect?

Added.

@Skylion007
Copy link
Collaborator

Skylion007 commented Aug 29, 2021

Do you think it's possible that this is related to this PyBind failing test case as well: #1301? Does that test case pass when added to this PR?

Yeah, I just realized it's a subtle issue with finalize_interperter and therefore is likely not an actual bug.

@yuantailing
Copy link
Contributor Author

@yuantailing could you help by converting your #3228 reproducer to a unit test? Maybe an addition to test_callbacks.cpp,py. constructor_stats.h may be useful. But I'm very uncertain about the details, I haven't touched this code myself, and I don't know how the original unit test (see commit above) relates to constructor_stats.h.

I have added a test case to tests/test_embed/test_interpreter.cpp. Maybe the code can be moved to a more proper place.

@Skylion007
Copy link
Collaborator

@yuantailing could you help by converting your #3228 reproducer to a unit test? Maybe an addition to test_callbacks.cpp,py. constructor_stats.h may be useful. But I'm very uncertain about the details, I haven't touched this code myself, and I don't know how the original unit test (see commit above) relates to constructor_stats.h.

I have added a test case to tests/test_embed/test_interpreter.cpp. Maybe the code can be moved to a more proper place.

Constructor_Stats.h just defines a helper class with keeps track of all its' constructions / destructions. You may want to use that class instead of your counter class if it makes sense:

class ConstructorStats {

@yuantailing
Copy link
Contributor Author

Constructor_Stats.h just defines a helper class with keeps track of all its' constructions / destructions. You may want to use that class instead of your counter class if it makes sense:

class ConstructorStats {

Thanks for your suggestions, fixed now.

…at the new tests fails without the change in pybind11.h
@rwgk
Copy link
Collaborator

rwgk commented Aug 29, 2021

The (void)func; are no-ops I think. Was that intentional? I just pushed a commit that changes them to func();. Could you please let me know if you think that's not correct?

I also added two CHECK(stat.alive() == 2); to make it more obvious and certain that the count actually goes down.

I will now look into moving the test elsewhere, because it's not specific to embedding, but specific to callbacks.

Ralf W. Grosse-Kunstleve added 2 commits August 29, 2021 16:51
@rwgk
Copy link
Collaborator

rwgk commented Aug 30, 2021

I will now look into moving the test elsewhere, because it's not specific to embedding, but specific to callbacks.

Done.

@Skylion007, if my changes look good to you, please go ahead and merge (pending completion of CI jobs).

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Assuming CI will be green.)

@yuantailing
Copy link
Contributor Author

The (void)func; are no-ops I think. Was that intentional? I just pushed a commit that changes them to func();. Could you please let me know if you think that's not correct?

The purpose is to avoid the potential unused variable warning. Change it to func() is also correct.

I also added two CHECK(stat.alive() == 2); to make it more obvious and certain that the count actually goes down.

I will now look into moving the test elsewhere, because it's not specific to embedding, but specific to callbacks.

Looks good.

@rwgk
Copy link
Collaborator

rwgk commented Aug 30, 2021

The purpose is to avoid the potential unused variable warning. Change it to func() is also correct.

Ah, got it. Thanks for the explanation!
We have a handy function to make that obvious: py::detail::silence_unused_warnings(py_func);

@Skylion007 Skylion007 linked an issue Aug 30, 2021 that may be closed by this pull request
3 tasks
@Skylion007 Skylion007 merged commit d6474ed into pybind:master Aug 31, 2021
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Aug 31, 2021
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Constructing pybind11::cpp_function with a closure causes memory leak
4 participants