Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support frozenset, tuple as dict keys #3886

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

ecatmur
Copy link
Contributor

@ecatmur ecatmur commented Apr 19, 2022

Description

#3836

Add frozenset as a pybind11 type.
Allow type_caster to be (explicitly) selected and specialized, defaulting to (inherit from) type_caster.
Use type_caster for std::set, std::map etc. key converter.
Specialize type_caster for std::set, std::vector etc.
Add tests.

Suggested changelog entry:

std::map<std::set<int>, int> now converts to/from Dict[FrozenSet[int], int]; likewise std::set<std::vector<int>> to/from Set[Tuple[int, ...]].

ecatmur and others added 4 commits April 19, 2022 18:26
pybind#3836

Add frozenset as a pybind11 type.
Allow type_caster<const T> to be (explicitly) selected and specialized, defaulting to (inherit from) type_caster<T>.
Use type_caster<const K> for std::set, std::map etc. key converter.
Specialize type_caster<const C> for std::set, std::vector etc.
Add tests.
@Skylion007
Copy link
Collaborator

The frozen set casters are a decent idea, but I don't know how I feel about the tuple ones considering we have std::tuple to fulfill that role already. Thoughts @rwgk @henryiii ?

@Skylion007 Skylion007 requested review from rwgk and henryiii April 19, 2022 19:15
@Skylion007
Copy link
Collaborator

The frozen set casters are a decent idea, but I don't know how I feel about the tuple ones considering we have std::tuple to fulfill that role already. Thoughts @rwgk @henryiii ?

@@ -1784,14 +1784,11 @@ class kwargs : public dict {
PYBIND11_OBJECT_DEFAULT(kwargs, dict, PyDict_Check)
};

class set : public object {
class set_base : public object {
protected:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like having add as a public function on frozen-sets. We can still call the C-API directly on the pointer for casters, but other code shouldn't be able to add objects to frozen sets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved non-const methods to protected

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would still need to be public for normal set?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ecatmur Why not just move them to set subclass and use the raw C-API to interact with the frozen set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I get it now. Fixed.

@ecatmur
Copy link
Contributor Author

ecatmur commented Apr 20, 2022

The frozen set casters are a decent idea, but I don't know how I feel about the tuple ones considering we have std::tuple to fulfill that role already.

I think the problem is that historically (and currently) Python's tuple has a bit of a mixed role: it's the container you go for when you need an immutable homogeneous sequence, as well as being the heterogeneous sequence type. For example, in typing you can write Tuple[int, ...] as well as Tuple[int, str].

FWIW, when PEP 351 was rejected (over 15 years ago!) Raymond Hettinger made the argument that list->tuple and set->frozenset are the only two transformations you actually ever need: https://mail.python.org/pipermail/python-dev/2005-October/057586.html That suggests a simpler approach would be to adjust list->tuple and set->frozenset after casting before insertion into dict (or set), avoiding the hairiness of type_caster<type const> and transparently supports all types casting to list and set (including third party stuff) at a small runtime cost. I'll propose that as an alternative.

Copy link
Contributor

@jiwaszki jiwaszki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Skylion007 @ecatmur I will join as the interested party. Thanks for porting my draft into actual codebase!

I tried the branch and get multiple errors during compilation on my Mac:

error: ambiguous partial specializations of 'type_caster' const_name("FrozenSet[", "Set[") + key_conv::name

Thus I was unable to test it further, however I have some observations.

  1. Personally I do not support an idea of the implicit cast from non-const std::set to frozenset. As I mentioned in Convert std::set to frozenset #3836 -- there should be a clear line between set and it's immutable version as it might confuse Python/pybind users. Ideally functions should only accept dicts coming (or getting out) in form of std::map<const std::set<...>, ...>. Is there a way to restrict that?
  2. I have also noticed that tests lack some of essential scenarios. What happens in case when code mix const and non-cost types? What happens when arguments are passed by copy?
  3. Following @Skylion007 concern about tuples as I have similar thoughts. std::tuple is already filling that niche, there is no need to expand in that area. As Python tuple will be converted to std::vector it will not only allow appending, but actually allow to modify it's contents -- I do not think this is an expected behaviour. If a user wants to do it, just use py::tuple and manually transfer the data to the mutable container.

@@ -94,6 +94,38 @@ def test_recursive_casting():
assert z[0].value == 7 and z[1].value == 42


def test_frozen_key(doc):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you added tests for each container, what about frozenset as standalone type? Is it even possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fcaff44 that tests you can pass frozenset to a C++ function that takes std::set (just as you can pass tuple to a C++ function taking std::vector). Is there's anything else you can do with frozenset on its own?

@ecatmur
Copy link
Contributor Author

ecatmur commented Apr 21, 2022

I tried the branch and get multiple errors during compilation on my Mac:

error: ambiguous partial specializations of 'type_caster' const_name("FrozenSet[", "Set[") + key_conv::name

Oh, sorry. Maybe #3887 would work? It's a simpler approach but the observable behavior should be the same.

Thus I was unable to test it further, however I have some observations.

  1. Personally I do not support an idea of the implicit cast from non-const std::set to frozenset. As I mentioned in Convert std::set to frozenset #3836 -- there should be a clear line between set and it's immutable version as it might confuse Python/pybind users. Ideally functions should only accept dicts coming (or getting out) in form of std::map<const std::set<...>, ...>. Is there a way to restrict that?

There isn't really such a thing as std::map<const std::set<...>, ...> - it isn't possible to get a non-const reference to the key in a std::map (not without an illegal const_cast, anyway) so no-one would bother putting const on the key. The value_type is std::pair<const Key, Value> so the keys are const as long as they're in the map.

So there isn't a conversion from non-const std::set to frozenset, it's from const std::set to frozenset.

  1. I have also noticed that tests lack some of essential scenarios. What happens in case when code mix const and non-cost types? What happens when arguments are passed by copy?

I'm only implementing this for the caster interface, not the binder interface, so everything is being passed by copy already and there's no difference between const and non-const at the top level.

  1. Following @Skylion007 concern about tuples as I have similar thoughts. std::tuple is already filling that niche, there is no need to expand in that area. As Python tuple will be converted to std::vector it will not only allow appending, but actually allow to modify it's contents -- I do not think this is an expected behaviour. If a user wants to do it, just use py::tuple and manually transfer the data to the mutable container.

We already allow passing tuple to C++ functions that take a std::vector (etc.) - there's one or two tests that show that already, and I've just added some more.

If Python code inserts a tuple key into a dict[tuple[str, ...], int] and passes it to C++ code as a std::map<std::vector<std::string>, int> then the C++ code can't modify the std::vectors (since they're const within the map) and if it removes them, inserts modified keys and passes the map back to Python then they'll be completely distinct objects. There's no way for a std::vector to alias a tuple and allows modifying it.

@rwgk
Copy link
Collaborator

rwgk commented Apr 23, 2022

I looked up and down both PRs quite a bit, but didn't get a chance yet to fully appreciate all details. It looks very interesting: not a lot of code, but makes something work that currently doesn't.

Which of the PRs do you prefer? I'd like to focus on the one that you think is best.

Another idea: could you please transfer the additional tests for tuple and frozenset that already work on current HEAD to a separate PR, even if it's a tiny PR? That will make it more clear here what already works and what doesn't. The more immediately improved test coverage is a nice plus.

ecatmur added a commit to ecatmur/pybind11 that referenced this pull request Apr 24, 2022
We already test that tuple can cast to std::vector and std::deque; add tests for std::vector<bool>, std::list and
std::valarray.

Extracted from pybind#3886.
ecatmur added 2 commits April 24, 2022 21:54
For the reverse direction, std::set still casts to set. This is in concordance with the behavior for sequence containers, where e.g. tuple casts to std::vector but std::vector casts to list.

Extracted from pybind#3886.
since this will be part of pybind11 public API
@ecatmur
Copy link
Contributor Author

ecatmur commented Apr 24, 2022

I looked up and down both PRs quite a bit, but didn't get a chance yet to fully appreciate all details. It looks very interesting: not a lot of code, but makes something work that currently doesn't.

Which of the PRs do you prefer? I'd like to focus on the one that you think is best.

#3887. While this one (#3886) is more idiomatic C++, #3887 is more Pythonic and that's appropriate for code that affects the Python side of things - it's automatically creating a Python API, so should stick to types that would appear naturally in Python programs. The clever recursive stuff would work in the STL binder API, perhaps.

Another idea: could you please transfer the additional tests for tuple and frozenset that already work on current HEAD to a separate PR, even if it's a tiny PR? That will make it more clear here what already works and what doesn't. The more immediately improved test coverage is a nice plus.

Sure: #3900 is for tuple tests. #3901 splits out the core frozenset functionality (any_set as a common base type, and std::set cast from any_set).

rwgk pushed a commit that referenced this pull request Apr 24, 2022
We already test that tuple can cast to std::vector and std::deque; add tests for std::vector<bool>, std::list and
std::valarray.

Extracted from #3886.
@Skylion007
Copy link
Collaborator

Let's focus on merging #3901 for now.

ecatmur and others added 6 commits May 1, 2022 12:17
and rename anyset methods
Making frozenset non-default constructible means that we need to adjust pyobject_caster to not require that its value is default constructible, by initializing value to a nil handle.  This also allows writing C++ functions taking anyset, and is arguably a performance improvement, since there is no need to allocate an object that will just be replaced by load.

Add some more tests, including anyset::empty, anyset::size, set::add and set::clear.
rwgk pushed a commit that referenced this pull request May 5, 2022
* Add frozenset, and allow it cast to std::set

For the reverse direction, std::set still casts to set. This is in concordance with the behavior for sequence containers, where e.g. tuple casts to std::vector but std::vector casts to list.

Extracted from #3886.

* Rename set_base to any_set to match Python C API

since this will be part of pybind11 public API

* PR: static_cast, anyset

* Add tests for frozenset

and rename anyset methods

* Remove frozenset default ctor, add tests

Making frozenset non-default constructible means that we need to adjust pyobject_caster to not require that its value is default constructible, by initializing value to a nil handle.  This also allows writing C++ functions taking anyset, and is arguably a performance improvement, since there is no need to allocate an object that will just be replaced by load.

Add some more tests, including anyset::empty, anyset::size, set::add and set::clear.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add rationale to `pyobject_caster` default ctor

* Remove ineffectual protected: access control

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants