Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit conversions to bool + np.bool_ conversion #925

Merged
merged 17 commits into from
Jul 23, 2017

Conversation

aldanor
Copy link
Member

@aldanor aldanor commented Jun 26, 2017

This fixes #922 but is quite hacky / not overly efficient -- mainly because we can't use proper numpy API in cast.h and because bool type caster is a specialisation...

(If an added if clause is a concern, I guess it could be moved over to numpy.h and made opt-in (e.g. add an optional function pointer in type_caster<bool>), but then the user would have to enable it manually in their code. On the bright side, it would be more efficient, without string comparisons etc.)

@wjakob
Copy link
Member

wjakob commented Jun 27, 2017

A more general solution would be to invoke a Python API function that will call the instance's __bool__ (3.x) or __nonzero__ (2.x) function. For some reason PyObject_Bool does not exist, so we may have to implement that ourselves.

@wjakob
Copy link
Member

wjakob commented Jun 27, 2017

(the implicit conversion should only take place when convert == true)

@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

A more general solution would be to invoke a Python API function that will call the instance's bool (3.x) or nonzero (2.x) function. For some reason PyObject_Bool does not exist, so we may have to implement that ourselves.

Indeed; I forgot about py2/3 differences. Another option is to just do a bool(obj), I guess?

(the implicit conversion should only take place when convert == true)

Should this really be counted as an implicit conversion?

@wjakob
Copy link
Member

wjakob commented Jun 27, 2017

Indeed; I forgot about py2/3 differences. Another option is to just do a bool(obj), I guess?

Right -- the question is: is there a C API binding to do exactly this.

(the implicit conversion should only take place when convert == true)

Should this really be counted as an implicit conversion?

Yes. For instance, bool(123) is perfectly valid, and we would not want this conversion to be preferred to that of another overload that accepts an integer. With the new two-phase overload traversal, the bool cast would only be considered in the second round.

@dean0x7d
Copy link
Member

PyObject_IsTrue() should do the trick.

@wjakob
Copy link
Member

wjakob commented Jun 27, 2017

Excellent -- that's the one.

@aldanor aldanor force-pushed the feature/numpy-bool branch 2 times, most recently from b29be01 to 63e57e8 Compare June 27, 2017 11:45
@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

Ok, a few changes:

  • Use PyObject_IsTrue() as suggested by @dean0x7d
  • Check for convert arg
  • Check for .dtype.kind which can be cast to char, one allocation less

I was wondering whether it's worth to cache PyObjectType(src.ptr()) somewhere in a static var after the first np.bool_ is passed in, so that in subsequent calls it's just an isinstance<> check, as opposed to getattr/string checks?

@wjakob
Copy link
Member

wjakob commented Jun 27, 2017

I was thinking more along the following lines:

In common.h, add:

// In Python 3.x block, Line 138+
#define PYBIND11_NONZERO "__bool__"

// In Python 2.x block: Line 158+
#define PYBIND11_NONZERO "__nonzero__"

Then, in the caster, use

....
else if (convert && hasattr(src, PYBIND11_NONZERO)) {
    value = (bool) PyObject_IsTrue(src.ptr());
    return true;
}
....

@dean0x7d
Copy link
Member

Note that PyObject_IsTrue can fail, e.g. bool(np.array([...])) raises an exception.

if (convert) {
    auto result = PyObject_IsTrue(src.ptr());
    if (result == -1)  // this *should* also cover missing `__bool__`/`__nonzero__`, 
        return false;  // but adding a proper test to make sure would be good
    value = result == 1;
    return true;
}

@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

Oh - I've misread your previous message then. I was only thinking about numpy.bool_.

Do we want to have global implicit bool conversions? (which looks a bit scary to me as most everything can be converted to a bool)

If we do, then __bool__ attr check is not enough; for instance:

>>> hasattr([], '__bool__')
False

(source for PyObject_IsTrue())

Also, for numpy.bool_, if we make it a special case, I'd rather not check for convert = true, as it is a bool (C bool).

@wjakob
Copy link
Member

wjakob commented Jun 27, 2017

this should also cover missing __bool__/__nonzero__,

I am not sure that is the case:

 % python
Python 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12)
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
...     pass
>>> A()
<__main__.A object at 0x1017b2a90>
>>> A().__bool__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'A' object has no attribute '__bool__'
>>> bool(A())
True

@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

Comments crossed with @dean0x7d :) Yea, if we want a generic implicit bool conversion, I'd do it like in #925 (comment)

@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

>>> bool(A())
True

Also

>>> bool(object())
True

🐼

@dean0x7d
Copy link
Member

dean0x7d commented Jun 27, 2017

@wjakob
I am not sure that is the case:

Yeah, I overlooked that objects can sometimes be converted to bool even without the magic method.

@aldanor
Do we want to have global implicit bool conversions? (which looks a bit scary to me as most everything can be converted to a bool)

I think that going completely broad should be fine (i.e. anything that isn't an error for PyObject_IsTrue):

  1. Overload resolution will not be affected thanks to the new two-phase system.
  2. Python happily coerces most things to bool:
def foo(arg):
    if arg:
        print("yes")
>>> foo(object())
yes

I think it's fine for pybind11 to have the same behavior, regardless of type or magic methods. Besides, it's simpler not to maintain a special pybind11-specific whitelist of what can be converted to bool when Python already has one in PyObject_IsTrue.

@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

Btw, looking at PyObject_IsTrue() source again (here's a blog post on it), we can
do this: basically, replace "return 1" fallback with "return -1". This way, converting A() in @wjakob's example would fail; converting object() would also fail. It would only work if either __bool__ / __nonzero__ are defined, or if it's a sequence / mapping, in which case it checks for length, or if it's a None.

(... or we can just use PyObject_IsTrue() as is.)

@aldanor
Copy link
Member Author

aldanor commented Jun 27, 2017

My suggestion is in 5a9b757, and an example of how it works in 1a820a4.

This will handle __nonzero / __bool__, sequence/mapping types, None (all of the above under implicit conversion); it will also handle numpy.bool_ with noconvert; everything else would be rejected (e.g. object()).

Does this look reasonable?

@aldanor aldanor changed the title Support np.bool_ in type_caster<bool> Implicit conversions to bool + np.bool_ conversion Jun 27, 2017
@aldanor aldanor force-pushed the feature/numpy-bool branch 2 times, most recently from 93963e6 to 047a11b Compare June 28, 2017 08:54
@dean0x7d
Copy link
Member

Given the separate paths needed for Python 2/3 and it looks like a workaround for PyPy, I feel like the simple PyObject_IsTrue() would be nicer, unless there is a good reason not to.

Tests: the ones that aren't related to numpy should be moved into test_builtin_casters.

@aldanor
Copy link
Member Author

aldanor commented Jun 28, 2017

Heh, PyPy spoilt my plans a bit, yea it looks like it needs a workaround...

I personally feel like from the user's standpoint using PyObject_IsTrue() here actually doesn't follow the path of least surprise (did you know bool(object()) is True? I didn't, and had to look at CPython code to confirm...). Plus, given that implicit conversions are enabled by default, a bool argument will happily accept any and every Python object.

That being said, from the dev standpoint, using PyObject_IsTrue() is indeed the most logical and least messy, and would work the same way on py2/py3/pypy. If we want to do it this way -- I'll update the code + tests. Anyone to confirm?

// Re: tests: ditto, I'll move them before squashing

@jagerman
Copy link
Member

I don't like the use of PyObject_IsTrue for this. I wouldn't expect a bool argument to accept "anything that can be used in an if statement", but rather to accept anything that looks like a boolean value (or emulates it with __bool__). PyObject_IsTrue here is a bit like forcecast: it feels like too much for default behaviour.

@aldanor
Copy link
Member Author

aldanor commented Jun 28, 2017

Does anyone have an idea on how to fix the pypy issue? :)

@jagerman
Copy link
Member

jagerman commented Jun 28, 2017

PyPy's implementation of PyObject_IsTrue is:

value = res != 0;
return true;
}
} else if (hasattr(src, "dtype")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to go before the else if (convert), otherwise it's not going to run if the arguments get into the second (convert-allowed) pass. That second pass can be triggered by some other argument that needs conversion, so anything that runs with convert = false needs to also work with convert = true (even if, as in this case, conversion isn't needed for this argument).

@dean0x7d
Copy link
Member

OK, I have no objections if PyObject_IsTrue is deemed too broad. If the convertible types are being restricted, then I'd vote for @wjakob's solution from #925 (comment): it's a simplification and also more strict (no sequence or mapping types allowed). As a bonus, it doesn't require any workarounds for PyPy.

@aldanor
Copy link
Member Author

aldanor commented Jun 29, 2017

OK, I have no objections if PyObject_IsTrue is deemed too broad. If the convertible types are being restricted, then I'd vote for @wjakob's solution from #925 (comment): it's a simplification and also more strict (no sequence or mapping types allowed). As a bonus, it doesn't require any workarounds for PyPy.

Re: hasattr version -- maybe it could be done that way on PyPy for the lack of better options-- in CPython it would be much, much slower than checking for tp_as_number->nb_bool. I'll try to play around with it today to see what works.

@wjakob
Copy link
Member

wjakob commented Jul 10, 2017

This still seems really complicated to me. Can’t we strip out the NumPy-specific bits (they should be handled by the default case that calls PyObject_IsTrue). The code below AFAIK should not have the "True for arbitrary objects" issue because it only runs when PYBIND11_NONZERO is an attribute.

if (!src) return false;
else if (src.ptr() == Py_True) { value = true; return true; }
else if (src.ptr() == Py_False) { value = false; return true; }
else if (convert && hasattr(src, PYBIND11_NONZERO)) {
    int res = PyObject_IsTrue(src.ptr());
    if (res == 0 || res == 1)
        return (bool) res;
}
return false;

@@ -153,8 +153,10 @@
#define PYBIND11_SLICE_OBJECT PyObject
#define PYBIND11_FROM_STRING PyUnicode_FromString
#define PYBIND11_STR_TYPE ::pybind11::str
#define PYBIND11_NONZERO "__bool__"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this to BOOL_ATTR or something along those lines to stick to the python-3-derived name here rather than python-2-derived, like we do with the above (e.g. "BYTES").

@wjakob
Copy link
Member

wjakob commented Jul 10, 2017

Calling any kind of pybind11-bound function with scalar data (e.g. booleans instead of arrays of booleans) that perhaps even require implicit conversions is not going to be that fast, so I don't know useful performance tuning is at that level.

@aldanor
Copy link
Member Author

aldanor commented Jul 10, 2017

@jagerman I've integrated your comments, but that's probably as simple as it's going to get... CPython optimization part is now just extra 4 lines of code which isn't too bad -- the rest has to be there anyway, including the none check at the start.

}
return false;
}
if (hasattr(src, "dtype")) {
Copy link
Member

@jagerman jagerman Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the above (i.e. the else if (convert) { ... } block) catch a numpy bool under convert == true? If so, this could be an else if to save the attribute lookup during a convert load when we already know it failed.

Copy link
Member Author

@aldanor aldanor Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. Fixed.

By the way, I think an even faster way would be to strcmp the type name (which should cost nothing to extract) to 'bool_' and only then check the dtype, etc. But not sure it's worth it here.

}
return false;
}
else if (hasattr(src, "dtype")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this branch is completely removed? Can't we have the second (converting) pass handle NumPy booleans (which is already optimized to avoid hasattr)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention is to get this into the no-convert pass, so that it can win the overload resolution if the function is overloaded with some other argument type (e.g. int).

Copy link
Member Author

@aldanor aldanor Jul 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, correct.

// as I've noted above, I think the second hasattr() can actually be avoided in np.bool_ cases via something cheap like strcmp(Py_TYPE(src.ptr())->tp_name, "bool_") prior to the dtype hasattr check (and maybe the following dtype.kind check can then be thrown out, because what else can it be but the np.bool_...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strcmp(tp_name, "bool_") also looks like a pretty nice simplification. In that case the separate np.bool_ logic could be removed completely and just leave:

else if (convert || strcmp(tp_name, "bool_") == 0) {
    ...
}

@jagerman jagerman mentioned this pull request Jul 21, 2017
6 tasks
@jagerman jagerman added this to the v2.2 milestone Jul 21, 2017
@aldanor
Copy link
Member Author

aldanor commented Jul 23, 2017

Ok, I've reduced the numpy.bool_ check to just a strcmp() on the type name, that seems to work.

@jagerman jagerman merged commit e07f758 into pybind:master Jul 23, 2017
@jagerman
Copy link
Member

That simplification is nice. Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.bool_ is not autoconverted to C++ bool unlike other numpy numeric types
4 participants