Add manual iterator implementations for custom vec iterables #1107

jakelishman · 2024-02-21T15:49:32Z

This adds manual Iterator implementors for the custom vec Iterable pyclasses, which were previously using the Python default iterator implementation that attempts calls to __getitem__ with successive integers until an IndexError is raised.

Doing this means that we no longer have any overhead from converting Python-space integer indices to Rust-space ones, because we just store a pointer to the backing vec internally and yield from that using an index we track ourselves.

Using the same setup code as discussed in #1096, with a manual iteration through afterwards:

import rustworkx as rx

N = 1_000_000
g = rx.PyGraph()
g.add_nodes_from(list(range(N)))
nodes = g.node_indexes()

%timeit for _ in nodes: pass

On main (which includes #1096), the timeit measured the iteration at ~78ms, and with this PR it dropped to ~48ms on my machine.

I didn't add all the new iterator structs to the typing information concretely, instead just referencing them opaquely as Iterator[T], because these objects only implement magic protocol methods and are not intended to be constructable / picklable / nameable from Python space at all (similar to how list_iterator isn't as a builtin). Open to changing that if preferred.

I'll mark this as:

Fix Performance issue with custom_vec_iter_impl! NodeIndices #1090

but feel free to edit this to remove that link if you think there's more places to look for performance here.

This adds manual `Iterator` implementors for the custom vec `Iterable` pyclasses, which were previously using the Python default iterator implementation that attempts calls to `__getitem__` with successive integers until an `IndexError` is raised. Doing this means that we no longer have any overhead from converting Python-space integer indices to Rust-space ones, because we just store a pointer to the backing vec internally and yield from that using an index we track ourselves.

coveralls · 2024-02-21T16:06:16Z

Pull Request Test Coverage Report for Build 7995769315

Details

-18 of 93 (80.65%) changed or added relevant lines in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.09%) to 96.517%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/iterators.rs	75	93	80.65%

Totals
Change from base Build 7982100393:	-0.09%
Covered Lines:	16821
Relevant Lines:	17428

💛 - Coveralls

mtreinish

This looks great to me, it's basically following what we did for mapping structs already with keys, values, and items but does it for the vec structs too. The performance improvements are really nice here too.

My only 2 thoughts are do you want to add a test using iter() explicitly in https://github.com/Qiskit/rustworkx/blob/main/tests/test_custom_return_types.py and also maybe a release not documenting the new feature?

jakelishman · 2024-02-21T16:14:18Z

Sure, I can add those. Flicking quickly through the tests now, I suppose there's a question of whether we want to implement more efficient reversed iterators as well. That's generated separately to the sequence-protocol iterator, so what I've done here doesn't affect reversed(g.node_indices()) nor reversed(iter(g.node_indices()) (which continues to not work, which is valid behaviour).

I can throw in a reversed iterator for each of these if you like as well, but it equally might not be worth whatever increase in compilation times it comes with?

jakelishman · 2024-02-21T16:17:21Z

Actually: what kind of iter tests were you thinking? I deliberately made the actual iterator type unnameable from Python, so I can't really isinstance check it or anything like that, and there shouldn't be any real testable behaviour that differs from the implicit sequence-protocol iterator. Any test case that iterates through one of these nodes (which many existing tests do) is already calling iter on them, it's just implicit within CPython.

jakelishman · 2024-02-21T16:20:51Z

Release node added in 8f9a991.

This matches the Python behaviour for the implicit sequence-protocol iterators; they have only `__length_hint__` and not `__len__`.

mtreinish · 2024-02-21T18:11:35Z

Actually: what kind of iter tests were you thinking? I deliberately made the actual iterator type unnameable from Python, so I can't really isinstance check it or anything like that, and there shouldn't be any real testable behaviour that differs from the implicit sequence-protocol iterator. Any test case that iterates through one of these nodes (which many existing tests do) is already calling iter on them, it's just implicit within CPython.

I was just thinking of something that explicitly exercises it, mostly to say we did. I wasn't expecting a type check, I think the existing tests we have for this on mappings do something like:

test = iter(object)
result = list(test)
self.assertEqual(result, expected)

It might just be duplicate coverage as that implicitly happens in other tests, but I didn't see the harm in at least validating it explicitly

mtreinish · 2024-02-21T18:14:00Z

I can throw in a reversed iterator for each of these if you like as well, but it equally might not be worth whatever increase in compilation times it comes with?

That seems worth it to me, personally I'm not too worried about rustworkx compile times. It might slow down development and CI a little bit, but we're not really bottlenecked on that (if it came down to it we could leverage debug mode a lot more and tweak things to optimize more for build speed). But, making sure the library is higher performance for python users is in general a high priority, so having faster reversed() iterator implementations makes sense to me.

jakelishman · 2024-02-21T21:32:45Z

Ok, I've added the reversed iterators in 5f60d9e and explicit tests of both iter and reversed in b68aa5f. I tweaked the iter tests slightly so they compare list(x) == list(iter(x)), which doubles it as a test that the iterable is multiply consumable.

The timings of the reversed iterators are the same as the timings as the forwards iterators: this PR took the million-node NodeIndices iteration (with a reversed() around it) from about 76ms to about 47ms this time when I ran them.

mtreinish

LGTM, thanks for doing this.

mtreinish · 2024-02-21T22:37:25Z

src/iterators.rs

+                    .borrow(py)
+                    .$data
+                    .len()
+                    .saturating_sub(self.index)


Oh that's a good trick to comply with __length_hint__ not returning a negative. Although as a usize I guess without this in release mode we would just overflow which wouldn't result in a ValueError although the hint would be wildly inaccurate (although PEP424 says the return is "not required to be accurate") for an empty or consumed iterator.

I actually wrote this initially for __len__ (which is required to be accurate) and reused it for __length_hint__, but then I remembered / realised that iterators don't typically implement __len__ so just renamed the method.

mtreinish added this to the 0.15.0 milestone Feb 21, 2024

mtreinish reviewed Feb 21, 2024

View reviewed changes

Add release note

8f9a991

Remove __len__ from custom iterators

37d7390

This matches the Python behaviour for the implicit sequence-protocol iterators; they have only `__length_hint__` and not `__len__`.

jakelishman added 3 commits February 21, 2024 18:31

Add reversed iterators

5f60d9e

Add explicit tests

b68aa5f

Fixup release note

ed6e820

mtreinish approved these changes Feb 21, 2024

View reviewed changes

mtreinish merged commit bf3f703 into Qiskit:main Feb 21, 2024
25 checks passed

jakelishman deleted the custom-iter-manual branch February 21, 2024 22:50

IvanIsCoding mentioned this pull request Feb 26, 2024

Generate all connected subgraphs of size k #1104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual iterator implementations for custom vec iterables #1107

Add manual iterator implementations for custom vec iterables #1107

jakelishman commented Feb 21, 2024 •

edited

Loading

coveralls commented Feb 21, 2024 •

edited

Loading

mtreinish left a comment

jakelishman commented Feb 21, 2024

jakelishman commented Feb 21, 2024 •

edited

Loading

jakelishman commented Feb 21, 2024

mtreinish commented Feb 21, 2024 •

edited

Loading

mtreinish commented Feb 21, 2024

jakelishman commented Feb 21, 2024

mtreinish left a comment

mtreinish Feb 21, 2024

jakelishman Feb 21, 2024

Add manual iterator implementations for custom vec iterables #1107

Add manual iterator implementations for custom vec iterables #1107

Conversation

jakelishman commented Feb 21, 2024 • edited Loading

coveralls commented Feb 21, 2024 • edited Loading

Pull Request Test Coverage Report for Build 7995769315

Details

💛 - Coveralls

mtreinish left a comment

Choose a reason for hiding this comment

jakelishman commented Feb 21, 2024

jakelishman commented Feb 21, 2024 • edited Loading

jakelishman commented Feb 21, 2024

mtreinish commented Feb 21, 2024 • edited Loading

mtreinish commented Feb 21, 2024

jakelishman commented Feb 21, 2024

mtreinish left a comment

Choose a reason for hiding this comment

mtreinish Feb 21, 2024

Choose a reason for hiding this comment

jakelishman Feb 21, 2024

Choose a reason for hiding this comment

jakelishman commented Feb 21, 2024 •

edited

Loading

coveralls commented Feb 21, 2024 •

edited

Loading

jakelishman commented Feb 21, 2024 •

edited

Loading

mtreinish commented Feb 21, 2024 •

edited

Loading