Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when inheriting from a type without a constructor #2432

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

virtuald
Copy link
Contributor

Improvement of #2152 as suggested by @rwgk and @YannickJadoul

@virtuald virtuald force-pushed the better-init-err branch 2 times, most recently from 773edfb to 1a8e2d0 Compare August 24, 2020 00:36
@virtuald
Copy link
Contributor Author

This is a bit hacky -- the thing that this is trying to avoid is taking the address of the inline function pybind11_object_init. There are several implications I can think of here:

  • If pybind11_object_init changed in the future, and multiple pybind11 modules are present, get_internals might return an old version of the function. Could be bad.
  • Since default_object_init is inline, if the default instance base was no longer a PyHeapTypeObject it could crash.
  • Looks like it doesn't work on PyPy :(

@virtuald virtuald force-pushed the better-init-err branch 3 times, most recently from 8374d97 to d5bffa9 Compare August 24, 2020 01:12
tests/test_class.py Show resolved Hide resolved
include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
include/pybind11/detail/class.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@YannickJadoul YannickJadoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, curious as to how and why this works? I thought tp_init was always copied from the superclass and just overwritten by "dynamically defined" __init__ methods?

I'll have a look in closer detail, later, to understand everything, but I'd be up for including this! :-)

include/pybind11/detail/class.h Outdated Show resolved Hide resolved
include/pybind11/detail/internals.h Outdated Show resolved Hide resolved
@YannickJadoul
Copy link
Collaborator

Trying this:

PYBIND11_MODULE(example, m)
{
        py::class_<X/*, std::shared_ptr<X>*/> classX(m, "X");
        printf("%p %p\n", ((PyHeapTypeObject*) py::detail::get_internals().instance_base)->ht_type.tp_init, ((PyHeapTypeObject*) classX.$

        classX.def(py::init<>());
        printf("%p %p\n", ((PyHeapTypeObject*) py::detail::get_internals().instance_base)->ht_type.tp_init, ((PyHeapTypeObject*) classX.$
}

shows that CPython indeed changes the tp_init field when adding an __init__ method:

$ python3.8
Python 3.8.0 (default, Oct 28 2019, 16:14:01) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import example
0x7fd980853919 0x7fd980853919
0x7fd980853919 0x5b0400

but PyPy doesn't:

$ pypy3
Python 3.6.9 (1608da62bfc7, Dec 23 2019, 10:50:04)
[PyPy 7.3.0 with GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import example
0x7f247d676608 0x7f247d676608
0x7f247d676608 0x7f247d676608
>>>> example.X.__init__
<<instancemethod __init__> at 0x00007f247d902e00>

So is this in the Python specification, or is this just an implementation detail of CPython?

Also, not sure how I feel about that myself, but just throwing this out there: similarly to ABCMeta.__subclasshook__, we could add something to the metaclass that checks this when defining the class?

@rwgk
Copy link
Collaborator

rwgk commented Aug 24, 2020

This is a bit hacky -- the thing that this is trying to avoid is taking the address of the inline function pybind11_object_init. There are several implications I can think of here:

  • If pybind11_object_init changed in the future, and multiple pybind11 modules are present, get_internals might return an old version of the function. Could be bad.
  • Since default_object_init is inline, if the default instance base was no longer a PyHeapTypeObject it could crash.
  • Looks like it doesn't work on PyPy :(

Hmmm ... what if we just tweaked the error message slightly, to express explicitly that __init__ must exist, for example:

"%.200s.__init__ must exist and must be called when overriding __init__ in Python."

I think that'd be >90% of the gain for very little effort and no risk.

The only doubt I had in connection with open_spiel: what if the extension class has no __init__, could that work?
If we insert must exist into the error message it's clear that it cannot work, and I wouldn't have spent any time thinking about it.

@YannickJadoul
Copy link
Collaborator

Hmmm ... what if we just tweaked the error message slightly, to express explicitly that __init__ must exist, for example:

At the risk of being overly blunt/critical/pedantic: isn't the fact that a method must exist a rather implicit when a messages says it must be called?

The main advantage I saw for this exception (for users of a pybind11-library, not for developers; those should just read the docs to know how to make some type subclassable), was that it would make clear they cannot subclass that type. But if that's not consistently possible, I'd try to keep the message as simple as possible.

@virtuald
Copy link
Contributor Author

This extra enhancement is just an additional debugging tool when users do something they shouldn't. As such, I'm fine with it only working on CPython, even if it happens to be an implementation detail. If the detail changes, the TypeError will still occur, and that's the most important part to me as it keeps debugging at the Python level instead of the C++ level for well-formed bindings.

@rwgk
Copy link
Collaborator

rwgk commented Aug 25, 2020

Hmmm ... what if we just tweaked the error message slightly, to express explicitly that __init__ must exist, for example:

At the risk of being overly blunt/critical/pedantic: isn't the fact that a method must exist a rather implicit when a messages says it must be called?

You're missing the point.

@YannickJadoul
Copy link
Collaborator

@virtuald Rather than checking on construction, I was playing around with adding this check when subclassing. To my surprise, this works:

/// metaclass `__new__` function that is used to create all pybind11 classes.
extern "C" inline PyObject *pybind11_meta_new(PyTypeObject *metacls, PyObject *args, PyObject *kwargs) {
    // use the default metaclass new to create the type
    PyObject *type = PyType_Type.tp_new(metacls, args, kwargs);
    if (type == nullptr) {
        return nullptr;
    }

#if !defined(PYPY_VERSION)
    auto default_init = ((PyHeapTypeObject*)get_internals().instance_base)->ht_type.tp_init;
    for (handle base : reinterpret_borrow<tuple>(((PyTypeObject *) type)->tp_bases)) {
        auto base_type = (PyTypeObject *) base.ptr();
        if (base_type->tp_init == default_init) {
            auto message = "%.200s has no __init__ and cannot be used as a base class from Python";
            PyErr_Format(PyExc_TypeError, message, base_type->tp_name);
            Py_DECREF(type);
            return nullptr;
        }
    }
#endif

    return type;
}

Any thoughts on this, @virtuald?
Doing this, I realized that subclassing a py::class_ type without __init__ is still OK. The problem is that you can never initialize an instance of the class (because you cannot initialize the C++ sub-instance), but you could still define a subclass.
This would mean your current solution is better, but also that your current error message is not fully correct. (Though, making a class without planning to create an object of it, is very much a niche case?).
Otherwise, if we fully want to inhibit users subclassing this, it might be nice to catch this early; i.e., when subclassing.

Side note: (the reason I'm surprised this works) When creating py::class_es from C++, we're not calling the tp_new of the metaclass. Which means we're skipping PyType_Type->tp_new (which does a lot, but not sure if it's crucial), but also skipping the tp_new of any custom py::metaclass(...) passed. This is a matter for another issue, though.

@YannickJadoul
Copy link
Collaborator

YannickJadoul commented Aug 25, 2020

At the risk of being overly blunt/critical/pedantic: isn't the fact that a method must exist a rather implicit when a messages says it must be called?

You're missing the point.

@rwgk Then what exactly is the point?

You've also ignored my point that this exception is meant for users of pybind11-created libraries, rather than for developers of pybind11 libraries. IMO, "%.200s.__init__ must exist" only confuses a Python user not familiar with pybind11. To those users seeing this exception (again, NOT users of pybind11, but users of users of pybind11), a method __init__ on the base class does exist (look at dir(...) or help(...)). So I'd prefer to not confuse my library's users with this extra complication.

The only doubt I had in connection with open_spiel: what if the extension class has no __init__, could that work?

__init__ gets inherited and it will tell that "No constructor defined!". We won't even get to showing the newly proposed error message.

@rwgk
Copy link
Collaborator

rwgk commented Aug 26, 2020

@rwgk Then what exactly is the point?

To remove the potential for doubt: someone looking at the current error message may reasonably wonder if it could work if there is no __init__, if if that's maybe something overlooked by the developers. By inserting three words ("must exist and") we can clearly signal: nope, we thought about it, it's not a bug, you really have to have a callable __init__ in this situation. With those three words, you can send that message right in the second where it's relevant and helpful, no need for anyone to dig in the documentation or sources, or ask on gitter etc.

My original question under #2152 was: do you know how much trouble it is to check if the base class implements __init__?

I see Dustin spent quite a bit of effort, but now seeing his work, and his "could be bad" remarks, my feeling is it's too much risk trying to be smarter here than tweaking the error message, especially if it doesn't work with PyPy.

@virtuald
Copy link
Contributor Author

I think adjusting the error message would be sufficient if there's no consensus about the danger posed by these changes.

@YannickJadoul I really like the idea of raising an exception on subclassing, but still suffers from the same problem of depending on the instance base of internals staying consistent.

@YannickJadoul
Copy link
Collaborator

To remove the potential for doubt: someone looking at the current error message may reasonably wonder if it could work if there is no __init__, if if that's maybe something overlooked by the developers. By inserting three words ("must exist and") we can clearly signal: nope, we thought about it, it's not a bug, you really have to have a callable __init__ in this situation. With those three words, you can send that message right in the second where it's relevant and helpful, no need for anyone to dig in the documentation or sources, or ask on gitter etc.

But that's the problem, there is an __init__ (look at the help or dir output). The __init__ just throws an error. So a user of a library without knowledge of pybind11 will get confused, because MyPybind11Class.__init__ does exist.

Next to that, that doesn't answer my comment on whom this exception is meant for. This is not an exception that (exclusively) ends up with pybind11 users/developers of a library, but with the users of a library made with pybind11.

I appreciate that in your/Google's case, you were both the writer of the pybind11 bindings, as well as the user trying to subclass it. But this is very different in my use case (a library consumed by mostly exclusively Python users). So, separating concerns and taking the perspective of a library user, this exception should not allude to the implementation detail that there is no def(py::init(...)) in the binding code. (I don't know if that convinces you?; if it doesn't, I don't know how else to explain, anymore.)

If the exception needs to be adapted, I think something like "%.200s must be constructible and %.200s.__init__() must be called when overriding __init__" is more appropriate.

(personally, I still don't really see the point. A library user forgets to call __init__, sees the error message, changes the code to calls __init__, and the __init__ error message says that the class can't be constructed: game over + report an issue to the library if there is a reason to subclass the pybind11-exposed type. pybind11's users should have read the docs. It would be nice if we could have a pybind11_fail somewhere during development, but we can't, because this is an error that happens later. I'm really not going to further dig myself into these trenches this bikeshed, though.)

@YannickJadoul
Copy link
Collaborator

@YannickJadoul I really like the idea of raising an exception on subclassing, but still suffers from the same problem of depending on the instance base of internals staying consistent.

@virtuald Yeah, I quite like the fact that you managed to check it (and there're more places where PyPy acts slightly different). The question is how consistently def(py::init(...)) will overwrite tp_init (CI will tell us, though), and if it's really worth the additional complexity (reasonably little complexity, though). I'm not saying this needs to be merged, but I also don't think it can't be merged.

If the base class doesn't have a constructor, tell the user that it cannot be
inherited from.

Caveats:

* Doesn't work on PyPy
* If the default instance base was no longer a PyHeapTypeObject this could crash.
* If pybind11_object_init changed in the future, and multiple pybind11 modules are
  present, get_internals might return an old version of the function
@rwgk
Copy link
Collaborator

rwgk commented Aug 31, 2020

I think adjusting the error message would be sufficient if there's no consensus about the danger posed by these changes.

@YannickJadoul I really like the idea of raising an exception on subclassing, but still suffers from the same problem of depending on the instance base of internals staying consistent.

Hi @virtuald (and @YannickJadoul), I believe we will need Wenzel's approval if we have this problem of depending on the instance base of internals staying consistent. — I still believe it's more trouble than it's worth.

@YannickJadoul
Copy link
Collaborator

Hi @virtuald (and @YannickJadoul), I believe we will need Wenzel's approval if we have this problem of depending on the instance base of internals staying consistent. — I still believe it's more trouble than it's worth.

For the solution in this PR, yes, I agree!
I'm also still fine with not doing anything and just adding a few extra lines of warnings in the docs. (Or with an adaptation of the exception message that makes sense to see as end-user of a pybind11 library.)

@YannickJadoul
Copy link
Collaborator

@virtuald, actually, is all this ((PyHeapTypeObject*)internals.instance_base)->ht_type.tp_init hocus pocus necessary?

https://en.cppreference.com/w/cpp/language/inline says:

An inline function or inline variable (since C++17) has the following properties:
[...]
2. An inline function or variable (since C++17) with external linkage (e.g. not declared static) has the following additional properties:
[...]
     3. It has the same address in every translation unit.

@virtuald
Copy link
Contributor Author

virtuald commented Sep 1, 2020

Presumably prior to C++17 the hocus pocus would be required?

... and anyways, it didn't work without the hocus pocus. I'd be happy to be proved wrong though.

@YannickJadoul
Copy link
Collaborator

Presumably prior to C++17 the hocus pocus would be required?

... and anyways, it didn't work without the hocus pocus. I'd be happy to be proved wrong though.

Hmmm, that part wasn't set as C++17 only, but I didn't try for myself. Dang, thought it would clean things up a bit :-(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants