Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds iterable support to pagination #59

Closed
wants to merge 11 commits into from
Closed

feat: adds iterable support to pagination #59

wants to merge 11 commits into from

Conversation

tdstein
Copy link
Collaborator

@tdstein tdstein commented Feb 29, 2024

I'm working on a change to make Paginator a mixin which adds iterable support. Currently, the paginator is a property of Users, but the end state will be class Users(Paginator[User]). This will add support to Users for iter, next, and len builtins.

@tdstein
Copy link
Collaborator Author

tdstein commented Feb 29, 2024

Once this is stable, it might be a good time to add the rest of the paginated API endpoints.

@nealrichardson
Copy link
Collaborator

Sadly, from my scanning of the API docs, there's like only one other paginated endpoint currently :(

@tdstein
Copy link
Collaborator Author

tdstein commented Feb 29, 2024

It looks like there are 3 total. Searching for page_number here: https://docs.posit.co/connect/api/swagger.json

users, groups, and bundles

Copy link

github-actions bot commented Mar 1, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
299 263 88% 80% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
src/posit/connect/client.py 100% 🟢
src/posit/connect/config.py 100% 🟢
src/posit/connect/paginator.py 100% 🟢
src/posit/connect/urls.py 100% 🟢
src/posit/connect/users.py 100% 🟢
TOTAL 100% 🟢

updated for commit: 09b85bd by action🐍

@tdstein tdstein marked this pull request as ready for review March 1, 2024 16:54
Copy link
Collaborator

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes here.

This is cool, but I'm wondering how it helps. To my eyes, it's less immediately obvious how the pagination works in practice with this implementation--the previous one is much less clever. So what do we get for this extra complexity, and is it worth it? It may very well be worth it, I just thought it worth asking.


# The maximum page size supported by the API.
_MAX_PAGE_SIZE = 500
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not constant across all paginated APIs?

src/posit/connect/paginator.py Outdated Show resolved Hide resolved
src/posit/connect/paginator.py Outdated Show resolved Hide resolved
src/posit/connect/paginator.py Show resolved Hide resolved
# Since the paginator fetches the users in pages, it will stop fetching subsequent pages once a match is found.
url = urls.append_path(self.config.url, "v1/users")
paginator = Paginator(self.session, url, page_size=page_size)
users = (User(**user) for user in paginator)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this, it's not obvious to me that each of these comprehensions isn't actually iterating over all of the results in the paginator. (There at least used to be versions of Python where that wouldn't be true, unless I'm misremembering.)

So for me, that merits (a) an inline comment acknowledging how it works, and (b) a responses-based test that would error if you went beyond a certain page of results. The current test doesn't ensure that, IIRC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments to try and explain the behavior.

@tdstein
Copy link
Collaborator Author

tdstein commented Mar 4, 2024

A few notes here.

This is cool, but I'm wondering how it helps. To my eyes, it's less immediately obvious how the pagination works in practice with this implementation--the previous one is much less clever. So what do we get for this extra complexity, and is it worth it? It may very well be worth it, I just thought it worth asking.

Thanks! I want to avoid unnecessary complexity, so happy to iron this out more.

Could you expand on which parts you find complex?

I have a few goals in mind.

  1. Allow idiomatic comprehension on resource collections: (user for user in client.users).
  2. Provide mixin support for other resource types.
  3. Make the comprehension memory efficient by waiting to fetch subsequent pages until they are needed.

As we've been discussing, the memory efficiency is premature since endpoint support is limited. But I hope this can encourage the migration of other endpoints that manage large collections.

Returns:
int: The total number of items in the paginator.
"""
# Connect's paginated APIs include a "total" count in the response.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a different paginator for APIs that use keyset pagination rather than offset pagination? The cookbook describes both types. Keyset pagination is used by the audit logs and content usage APIs and doesn't provide a total count or random access to pages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I wasn't aware of this API yet. Yes, we can create two different Paginatior implementations. Add #75

@nealrichardson
Copy link
Collaborator

I've thought about this some more, take this for what it's worth:

  • To me Paginator feels like it should be a lower-level utility to wrap the paginated APIs. Something closer to replacing what https://docs.posit.co/connect/cookbook/pagination/#offset-pagination does directly. An Iterable subclass may use a Paginator, but it reads odd to me that it is a subclasses of Paginator. I may be wrong, but I think this is what my "complexity" observation was getting at before.
  • We previously punted on whether the collections should be iterable directly and decided to wait for user feedback (Questions for feedback on initial release #50). I'm not sure we know anything more this week to decide that it's a good idea. (Also not saying it's necessarily a bad idea.)
  • I don't think that it's safe to lazily take pages to feed the iterator. The content API at least is sorted by "recently updated", right? So if I iterate through that, make updates to some or all of the items in it, and fetch pages lazily, the contents of e.g. "page_number=2" may be changing as a result of my action. Users seem to be sorted alphabetically. So if paginated results are sorted on mutable properties, we may get surprising results unless we fetch all before allowing actions that could mutate the server. Even if sorting on immutable properties, the ability to delete entries will make page contents unstable.

So, IMO: we could add iterable support to collections, though I personally wouldn't add it to the paginator class. And I don't think that lazily fetching pages when iterating through all is a safe idea.

@tdstein
Copy link
Collaborator Author

tdstein commented Mar 5, 2024

  • I don't think that it's safe to lazily take pages to feed the iterator. The content API at least is sorted by "recently updated", right? So if I iterate through that, make updates to some or all of the items in it, and fetch pages lazily, the contents of e.g. "page_number=2" may be changing as a result of my action. Users seem to be sorted alphabetically. So if paginated results are sorted on mutable properties, we may get surprising results unless we fetch all before allowing actions that could mutate the server. Even if sorting on immutable properties, the ability to delete entries will make page contents unstable.

I see this as a server-side implementation issue rather than a client-side one. Without consistent ordering and support for pagination tokens, the underlying dataset can change regardless of the time between page fetches.

@nealrichardson
Copy link
Collaborator

  • I don't think that it's safe to lazily take pages to feed the iterator. The content API at least is sorted by "recently updated", right? So if I iterate through that, make updates to some or all of the items in it, and fetch pages lazily, the contents of e.g. "page_number=2" may be changing as a result of my action. Users seem to be sorted alphabetically. So if paginated results are sorted on mutable properties, we may get surprising results unless we fetch all before allowing actions that could mutate the server. Even if sorting on immutable properties, the ability to delete entries will make page contents unstable.

I see this as a server-side implementation issue rather than a client-side one. Without consistent ordering and support for pagination tokens, the underlying dataset can change regardless of the time between page fetches.

Sure, it's a server-side issue, but supposing we change/fix it there, the SDK still has to support the current API. We can only hope to minimize its impact by fetching all at the ~same time (AFAICT).

@tdstein
Copy link
Collaborator Author

tdstein commented Mar 6, 2024

I'm going to abandon this idea for the time being. We can revisit if we decide to make resources iterable in the future.

@tdstein tdstein closed this Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants