feat: adds iterable support to pagination #59

tdstein · 2024-02-29T16:28:40Z

I'm working on a change to make Paginator a mixin which adds iterable support. Currently, the paginator is a property of Users, but the end state will be class Users(Paginator[User]). This will add support to Users for iter, next, and len builtins.

tdstein · 2024-02-29T16:33:21Z

Once this is stable, it might be a good time to add the rest of the paginated API endpoints.

nealrichardson · 2024-02-29T17:13:41Z

Sadly, from my scanning of the API docs, there's like only one other paginated endpoint currently :(

tdstein · 2024-02-29T17:28:06Z

It looks like there are 3 total. Searching for page_number here: https://docs.posit.co/connect/api/swagger.json

users, groups, and bundles

github-actions · 2024-03-01T04:05:08Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
299	263	88%	80%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
src/posit/connect/client.py	100%	🟢
src/posit/connect/config.py	100%	🟢
src/posit/connect/paginator.py	100%	🟢
src/posit/connect/urls.py	100%	🟢
src/posit/connect/users.py	100%	🟢
TOTAL	100%	🟢

updated for commit: 09b85bd by action🐍

nealrichardson

A few notes here.

This is cool, but I'm wondering how it helps. To my eyes, it's less immediately obvious how the pagination works in practice with this implementation--the previous one is much less clever. So what do we get for this extra complexity, and is it worth it? It may very well be worth it, I just thought it worth asking.

nealrichardson · 2024-03-04T20:04:15Z

src/posit/connect/paginator.py


-# The maximum page size supported by the API.
-_MAX_PAGE_SIZE = 500


Is this not constant across all paginated APIs?

src/posit/connect/paginator.py

nealrichardson · 2024-03-04T20:13:54Z

src/posit/connect/users.py

+        # Since the paginator fetches the users in pages, it will stop fetching subsequent pages once a match is found.
+        url = urls.append_path(self.config.url, "v1/users")
+        paginator = Paginator(self.session, url, page_size=page_size)
+        users = (User(**user) for user in paginator)


Reading this, it's not obvious to me that each of these comprehensions isn't actually iterating over all of the results in the paginator. (There at least used to be versions of Python where that wouldn't be true, unless I'm misremembering.)

So for me, that merits (a) an inline comment acknowledging how it works, and (b) a responses-based test that would error if you went beyond a certain page of results. The current test doesn't ensure that, IIRC.

I added some comments to try and explain the behavior.

tdstein · 2024-03-04T21:42:13Z

A few notes here.

This is cool, but I'm wondering how it helps. To my eyes, it's less immediately obvious how the pagination works in practice with this implementation--the previous one is much less clever. So what do we get for this extra complexity, and is it worth it? It may very well be worth it, I just thought it worth asking.

Thanks! I want to avoid unnecessary complexity, so happy to iron this out more.

Could you expand on which parts you find complex?

I have a few goals in mind.

Allow idiomatic comprehension on resource collections: (user for user in client.users).
Provide mixin support for other resource types.
Make the comprehension memory efficient by waiting to fetch subsequent pages until they are needed.

As we've been discussing, the memory efficiency is premature since endpoint support is limited. But I hope this can encourage the migration of other endpoints that manage large collections.

mmarchetti · 2024-03-05T17:07:04Z

src/posit/connect/paginator.py

+        Returns:
+            int: The total number of items in the paginator.
+        """
+        # Connect's paginated APIs include a "total" count in the response.


Will there be a different paginator for APIs that use keyset pagination rather than offset pagination? The cookbook describes both types. Keyset pagination is used by the audit logs and content usage APIs and doesn't provide a total count or random access to pages.

Thanks for pointing this out. I wasn't aware of this API yet. Yes, we can create two different Paginatior implementations. Add #75

nealrichardson · 2024-03-05T18:35:55Z

I've thought about this some more, take this for what it's worth:

To me Paginator feels like it should be a lower-level utility to wrap the paginated APIs. Something closer to replacing what https://docs.posit.co/connect/cookbook/pagination/#offset-pagination does directly. An Iterable subclass may use a Paginator, but it reads odd to me that it is a subclasses of Paginator. I may be wrong, but I think this is what my "complexity" observation was getting at before.
We previously punted on whether the collections should be iterable directly and decided to wait for user feedback (Questions for feedback on initial release #50). I'm not sure we know anything more this week to decide that it's a good idea. (Also not saying it's necessarily a bad idea.)
I don't think that it's safe to lazily take pages to feed the iterator. The content API at least is sorted by "recently updated", right? So if I iterate through that, make updates to some or all of the items in it, and fetch pages lazily, the contents of e.g. "page_number=2" may be changing as a result of my action. Users seem to be sorted alphabetically. So if paginated results are sorted on mutable properties, we may get surprising results unless we fetch all before allowing actions that could mutate the server. Even if sorting on immutable properties, the ability to delete entries will make page contents unstable.

So, IMO: we could add iterable support to collections, though I personally wouldn't add it to the paginator class. And I don't think that lazily fetching pages when iterating through all is a safe idea.

tdstein · 2024-03-05T19:04:07Z

I don't think that it's safe to lazily take pages to feed the iterator. The content API at least is sorted by "recently updated", right? So if I iterate through that, make updates to some or all of the items in it, and fetch pages lazily, the contents of e.g. "page_number=2" may be changing as a result of my action. Users seem to be sorted alphabetically. So if paginated results are sorted on mutable properties, we may get surprising results unless we fetch all before allowing actions that could mutate the server. Even if sorting on immutable properties, the ability to delete entries will make page contents unstable.

I see this as a server-side implementation issue rather than a client-side one. Without consistent ordering and support for pagination tokens, the underlying dataset can change regardless of the time between page fetches.

nealrichardson · 2024-03-05T19:09:25Z

I don't think that it's safe to lazily take pages to feed the iterator. The content API at least is sorted by "recently updated", right? So if I iterate through that, make updates to some or all of the items in it, and fetch pages lazily, the contents of e.g. "page_number=2" may be changing as a result of my action. Users seem to be sorted alphabetically. So if paginated results are sorted on mutable properties, we may get surprising results unless we fetch all before allowing actions that could mutate the server. Even if sorting on immutable properties, the ability to delete entries will make page contents unstable.

I see this as a server-side implementation issue rather than a client-side one. Without consistent ordering and support for pagination tokens, the underlying dataset can change regardless of the time between page fetches.

Sure, it's a server-side issue, but supposing we change/fix it there, the SDK still has to support the current API. We can only hope to minimize its impact by fetching all at the ~same time (AFAICT).

tdstein · 2024-03-06T20:25:06Z

I'm going to abandon this idea for the time being. We can revisit if we decide to make resources iterable in the future.

tdstein force-pushed the tdstein/47 branch from 0823800 to a3e7e6b Compare March 1, 2024 16:43

tdstein marked this pull request as ready for review March 1, 2024 16:54

tdstein requested review from nealrichardson and dbkegley March 1, 2024 16:54

tdstein force-pushed the tdstein/47 branch from 1496cf8 to c845c1e Compare March 4, 2024 14:52

nealrichardson reviewed Mar 4, 2024

View reviewed changes

tdstein added 10 commits March 4, 2024 15:31

--wip-- [skip ci]

e682050

--wip-- [skip ci]

593d4f6

--wip-- [skip ci]

d2c87c7

--wip-- [skip ci]

a5d8db4

build: adds fix step to local workflow

4c961cb

fix: type definition for fetch

8c4e139

chore: cleanup

6725987

feat: adds paginator tests

fc0962e

chore: adds class docstring

a04e190

removes verbose inline comments

b225afa

tdstein force-pushed the tdstein/47 branch from c845c1e to b225afa Compare March 4, 2024 20:36

adds comments to explain the advantages of using a generator expression

09b85bd

mmarchetti reviewed Mar 5, 2024

View reviewed changes

tdstein closed this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds iterable support to pagination #59

feat: adds iterable support to pagination #59

tdstein commented Feb 29, 2024 •

edited

Loading

tdstein commented Feb 29, 2024

nealrichardson commented Feb 29, 2024

tdstein commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Mar 1, 2024 •

edited

Loading

nealrichardson left a comment

nealrichardson Mar 4, 2024

nealrichardson Mar 4, 2024

tdstein Mar 4, 2024

tdstein commented Mar 4, 2024 •

edited

Loading

mmarchetti Mar 5, 2024

tdstein Mar 5, 2024

nealrichardson commented Mar 5, 2024

tdstein commented Mar 5, 2024

nealrichardson commented Mar 5, 2024

tdstein commented Mar 6, 2024


		# The maximum page size supported by the API.
		_MAX_PAGE_SIZE = 500

feat: adds iterable support to pagination #59

feat: adds iterable support to pagination #59

Conversation

tdstein commented Feb 29, 2024 • edited Loading

tdstein commented Feb 29, 2024

nealrichardson commented Feb 29, 2024

tdstein commented Feb 29, 2024 • edited Loading

github-actions bot commented Mar 1, 2024 • edited Loading

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

nealrichardson left a comment

Choose a reason for hiding this comment

nealrichardson Mar 4, 2024

Choose a reason for hiding this comment

nealrichardson Mar 4, 2024

Choose a reason for hiding this comment

tdstein Mar 4, 2024

Choose a reason for hiding this comment

tdstein commented Mar 4, 2024 • edited Loading

mmarchetti Mar 5, 2024

Choose a reason for hiding this comment

tdstein Mar 5, 2024

Choose a reason for hiding this comment

nealrichardson commented Mar 5, 2024

tdstein commented Mar 5, 2024

nealrichardson commented Mar 5, 2024

tdstein commented Mar 6, 2024

tdstein commented Feb 29, 2024 •

edited

Loading

tdstein commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Mar 1, 2024 •

edited

Loading

tdstein commented Mar 4, 2024 •

edited

Loading