-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: adds iterable support to pagination #59
Conversation
Once this is stable, it might be a good time to add the rest of the paginated API endpoints. |
Sadly, from my scanning of the API docs, there's like only one other paginated endpoint currently :( |
It looks like there are 3 total. Searching for page_number here: https://docs.posit.co/connect/api/swagger.json users, groups, and bundles |
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes here.
This is cool, but I'm wondering how it helps. To my eyes, it's less immediately obvious how the pagination works in practice with this implementation--the previous one is much less clever. So what do we get for this extra complexity, and is it worth it? It may very well be worth it, I just thought it worth asking.
|
||
# The maximum page size supported by the API. | ||
_MAX_PAGE_SIZE = 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not constant across all paginated APIs?
src/posit/connect/users.py
Outdated
# Since the paginator fetches the users in pages, it will stop fetching subsequent pages once a match is found. | ||
url = urls.append_path(self.config.url, "v1/users") | ||
paginator = Paginator(self.session, url, page_size=page_size) | ||
users = (User(**user) for user in paginator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this, it's not obvious to me that each of these comprehensions isn't actually iterating over all of the results in the paginator. (There at least used to be versions of Python where that wouldn't be true, unless I'm misremembering.)
So for me, that merits (a) an inline comment acknowledging how it works, and (b) a responses
-based test that would error if you went beyond a certain page of results. The current test doesn't ensure that, IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments to try and explain the behavior.
Thanks! I want to avoid unnecessary complexity, so happy to iron this out more. Could you expand on which parts you find complex? I have a few goals in mind.
As we've been discussing, the memory efficiency is premature since endpoint support is limited. But I hope this can encourage the migration of other endpoints that manage large collections. |
Returns: | ||
int: The total number of items in the paginator. | ||
""" | ||
# Connect's paginated APIs include a "total" count in the response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will there be a different paginator for APIs that use keyset pagination rather than offset pagination? The cookbook describes both types. Keyset pagination is used by the audit logs and content usage APIs and doesn't provide a total count or random access to pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I wasn't aware of this API yet. Yes, we can create two different Paginatior
implementations. Add #75
I've thought about this some more, take this for what it's worth:
So, IMO: we could add iterable support to collections, though I personally wouldn't add it to the paginator class. And I don't think that lazily fetching pages when iterating through all is a safe idea. |
I see this as a server-side implementation issue rather than a client-side one. Without consistent ordering and support for pagination tokens, the underlying dataset can change regardless of the time between page fetches. |
Sure, it's a server-side issue, but supposing we change/fix it there, the SDK still has to support the current API. We can only hope to minimize its impact by fetching all at the ~same time (AFAICT). |
I'm going to abandon this idea for the time being. We can revisit if we decide to make resources iterable in the future. |
I'm working on a change to make Paginator a mixin which adds iterable support. Currently, the paginator is a property of Users, but the end state will be
class Users(Paginator[User])
. This will add support to Users foriter
,next
, andlen
builtins.