Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Wrong caching of the overrides #3369

Closed
2 of 3 tasks
Trigve opened this issue Oct 15, 2021 · 4 comments · Fixed by #3465
Closed
2 of 3 tasks

[BUG]: Wrong caching of the overrides #3369

Trigve opened this issue Oct 15, 2021 · 4 comments · Fixed by #3465
Assignees
Labels
triage New bug, unverified

Comments

@Trigve
Copy link
Contributor

Trigve commented Oct 15, 2021

Required prerequisites

Problem description

pybind11 2.6.2 (or smart holder branch), Python 3.7.10, VS 2019 C++20 (compiled as x86), Windows 10 x64

There is problem of caching the python functions overrides. The problem could be simulated using the code below.

In the example, there are 2 function in python.
func() does returns the python-derived class in which the method is overriden from the base class.
func2() does returns python-derived class BUT NO method is overriden, therefor C++ function is called in the end.

The problem is, that the cache (get_internals().inactive_override_cache) does store the pair<handle of the type, func name address>. But if the type is deallocated, it isn't removed from the cache and in future there could be match of the given type handle and the function name address. The problem seems to be with pybind11_meta_dealloc(). Because the type isn't the one registered with pybind11, it doesn't erase it from the override cache. (edit: Looking to it more carefully, the actual problem could be in all_type_info_get_cache(). Because, there type is removed from the registered_types_py, but the override cache isn't checked).

On the line marked "// (2)" the p_obj2->func(); call should be store/retrieved from the override cache as it always resolve to the C++ function.
The code on line marked "// (1)" should ALWAYS returns 42 but on some random iteration it would return 0 (and assertion is hit "// (3)"), because the cache would hold the same handle and function name address and that means that function is in C++ and not in python instance (which is wrong).

This does occurs using #1546 (comment) and also using smart holder branch https://github.com/pybind/pybind11/tree/smart_holder (@rwgk).

The poor-man's fix is to replace inactive_override_cache std::unordered_set<std::pair<const PyObject *, const char *>, override_hash> with std::unordered_map<const PyObject *, std::unordered_set<const char *>> and in pybind11_meta_dealloc() erase also the types not registered by pybind11. It does work in my production code (edit: As noted above, the better fix could be in all_type_info_get_cache() and erase the override from the cache when the type is unregistered).

Workaround for python code: Move the class Test(core.DialogLogic): to the "global" scope. edit: This is only partial workaround. If whole pybind11::exec() is moved into the loop, assertion happens too.

Reproducible example code

#include <iostream>
#include <filesystem>
#include <memory>
#include <pybind11/pybind11.h>
#include <pybind11/embed.h>
#include <pybind11/eval.h>
//#include <pybind11/shared_ptr.h>

using namespace std::literals::chrono_literals;

class DialogLogic
{

public:
	virtual int func()
	{
		return 0;
	}

	DialogLogic() = default;
	DialogLogic &operator=(DialogLogic const &Right) = delete;
	DialogLogic(DialogLogic const &Copy) = delete;
};

class DialogLogicWrap final : public DialogLogic
{
	virtual int func() override final
	{
		pybind11::gil_scoped_acquire gil;

		assert(PyErr_Occurred() == NULL);

		pybind11::function func_override = pybind11::get_override(static_cast<DialogLogic *>(this),
			"func"
		);

		assert(PyErr_Occurred() == NULL);

		if(func_override)
			return pybind11::cast<int>(func_override());
		
		return DialogLogic::func();
	}
};

PYBIND11_EMBEDDED_MODULE(core, m)
{
//	pybind11::class_<DialogLogic, DialogLogicWrap, std::shared_ptr<DialogLogic>>(m, "DialogLogic")
	pybind11::class_<DialogLogic, DialogLogicWrap, PYBIND11_SH_DEF(DialogLogic)>(m, "DialogLogic")
		.def(pybind11::init_alias<>())
		.def("func", &DialogLogic::func)
		;
}

int main()
{
	Py_SetPath((std::filesystem::current_path().generic_wstring() + L"/python37.zip;" + std::filesystem::current_path().generic_wstring()).c_str());

	pybind11::scoped_interpreter guard{};

	PyEval_InitThreads();

	{

		pybind11::dict local;

		pybind11::module_ mod_core = pybind11::module::import("core");
		try
		{
			pybind11::exec(R"(
def func():
	import core

	class Test(core.DialogLogic):
		
		def func(self):
			return 42

	return Test()

def func2():
	import core

	class Test(core.DialogLogic):
			pass

	return Test()
)",
				pybind11::globals(),
				local
			);

			for(int i = 0;;++i)
			{
				auto p_obj = pybind11::cast<std::shared_ptr<DialogLogic>>(local["func"]());
				
				int ret = p_obj->func(); // (1)
				assert(ret == 42); // (3)

				auto p_obj2 = pybind11::cast<std::shared_ptr<DialogLogic>>(local["func2"]());

				p_obj2->func(); // (2)
			}
		}
		catch(std::exception const &e)
		{
			std::cerr << e.what();
		}
	}
}
@Trigve Trigve added the triage New bug, unverified label Oct 15, 2021
@rwgk
Copy link
Collaborator

rwgk commented Nov 8, 2021

Hi @Trigve, I looked a bit although I don't have much relevant background. I don't use embedding and I'm not familiar with the overload caching functionality. But this sounds good to me:

As noted above, the better fix could be in all_type_info_get_cache() and erase the override from the cache when the type is unregistered

If you want to work on that, please send the PR to me for review. It doesn't have to be polished or even "finished", if you reach certain decision points just push what you have and we'll take a look.

Thinking a bit more high level, could you work around the caching bug by importing core only once, then pass that via globals?

Thinking even more high level, I usually strongly advise against embedding. You can only safely start and stop the interpreter once in a process anyway (see Bugs and caveats here). If you can organize your system to let Python be the main, and your main be an extension, you'll be more on a beaten path and probably encounter less bugs.

@Trigve
Copy link
Contributor Author

Trigve commented Nov 9, 2021

Thanks for the reply @rwgk

If you want to work on that, please send the PR to me for review. It doesn't have to be polished or even "finished", if you reach certain decision points just push what you have and we'll take a look.

I have a git patches though but if you need the PR I'll try it.

Thinking a bit more high level, could you work around the caching bug by importing core only once, then pass that via globals?

I'm importing the core only once. And the problem isn't about module importing. The main problem (TLDR;) is "use after free". This problem isn't tied specific to the embedding, the only precondition to triggering it is to store python instance (whose type is "dynamic" - i.e. class is constructed on the fly) in C++ code and use it later. If you want I could make you a detailed report.

Thinking even more high level, I usually strongly advise against embedding. You can only safely start and stop the interpreter once in a process anyway (see Bugs and caveats here). If you can organize your system to let Python be the main, and your main be an extension, you'll be more on a beaten path and probably encounter less bugs.

In my case that is not an option (Our C++ framework does us python as scripting language). I only start/finalize the python interpreter once (are using process isolation, that is process is spawn when needed) so that is also not a problem.

@rwgk
Copy link
Collaborator

rwgk commented Nov 9, 2021

I have a git patches though but if you need the PR I'll try it.

Yes. We can assist, but not drive in many cases, like this one. (We're maintaining pybind11 in volunteer time.)

The main problem (TLDR;) is "use after free". This problem isn't tied specific to the embedding, the only precondition to triggering it is to store python instance (whose type is "dynamic" - i.e. class is constructed on the fly) in C++ code and use it later.

Could you turn that into a non-embedding unit test?

I have easy access to clang sanitizers (asan, msan), but only for extensions (not embedding).

@Trigve
Copy link
Contributor Author

Trigve commented Nov 13, 2021

Yes. We can assist, but not drive in many cases, like this one. (We're maintaining pybind11 in volunteer time.)

I've made a PR #3462

Could you turn that into a non-embedding unit test?

I have easy access to clang sanitizers (asan, msan), but only for extensions (not embedding).

The tests are included in the PR (embedded and non-embedded also). Be aware it is not DIRECT use after free (I've used apostrophes to distinguish it), because the freed memory isn't read/written, just use as the key in the cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage New bug, unverified
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants