-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pagination functionality to FHIRClient and related unit tests #169
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much! This is a nice idea. A couple quick comments, did not pore over it
fhirclient/client.py
Outdated
next_link = self.get_next_link(bundle) | ||
if next_link: | ||
sanitized_next_link = self.sanitize_next_link(next_link) | ||
return self.execute_pagination_request(sanitized_next_link) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a bad approach, but feels like it's so close to being a really Pythonic iter() pattern. I don't have a strong preference, but curious if you considered that approach. Not sure what the API would look like for that. Maybe something like?
for bundle in client.iter_pages(first_bundle):
Not sure I love that either, but the current use wouldn't be too far off:
bundle = first_bundle
while bundle := client.fetch_next_page(bundle):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are also a lot of ways to do the iteration approach. Off the top of my head:
- We could add
__iter__
and__next__
methods to the Bundle object directly... - We could add FHIRSearch.perform_iter() as a generator maybe that yields Bundle objects...
- Or the above FHIRClient.iter_pages() approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of which... FHIRSearch.perform_resources()
should probably iterate through the Bundles behind the scenes, yeah?
Would that functionality diminish the need for a separate pagination API?
It currently says it returns a list... But doesn't actually annotate anything. I wonder if it would be an acceptable change to make it a generator, so that all the results don't need to be in memory at once. Probably not. I can see some existing consumers getting bit by that, but I also don't love the current API - especially if we're downloading multiple Bundles at once. Maybe if we want to do the memory-kind approach, we'd need to add a separate API call like perform_resources_iter()
until we break API and can replace perform_resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for all your ideas and comments. The latest changes are pushed.
Since backward compatibility is a concern, I kept all changes in FHIRClient class. If existing code relies on Bundle objects being iterated over using traditional loops or other methods that expect a list-like behavior, adding iter and next might interfere with these expectations. Users might also have existing methods that do not anticipate Bundle objects to be iterable in this specific way. Please let me know if you disagree and want me to change code further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your current API is good! But I also want to natter a bit about "how would we change this if we were allowing ourselves to break compatibility, then work backward from there" - so that we can work towards the API we want to have in 5.x.
So Bundles largely come up in search operations (though not only). And the main place we deal with them is FHIRSearch, which has two touch points: perform()
(performs a search and returns a Bundle) and perform_resources()
(performs a search and abstracts the Bundle pieces).
I think in my personal dream 5.x scenario:
perform()
becomes a Bundle generator. (Because, to my understanding, any search can yield multiple Bundles, at server discretion - so API wise, there's not really a need for a single-bundleperform
)perform_resources()
silently adds pagination support behind the scenes as a generator.- Some way to navigate Bundle links manually in case you got you Bundle a different way (like a history operation, or just a Bundle read from disk or something).
- Maybe your
iter_pages()
call. - Maybe
next()
,prev()
,first()
, andlast()
calls layered into the Bundle resource.
- Maybe your
For the current 4.x timeline, to avoid breaking API for the above, I might propose we add:
perform_iter()
(generator for Bundles)perform_resources_iter()
(generator for Resources)- And then for 5.x, just replace the normal method names with these two above implementations (and maybe make
_iter
an alias).
- And then for 5.x, just replace the normal method names with these two above implementations (and maybe make
- Update
perform_resources()
to follow Bundle links transparently (and return all resources found as one giant list) - The manual Bundle link solution.
I'm hoping that my perform
ideas aren't too controversial. And we can focus our bike-shedding on the manual link API a bit more. (But I welcome push-back on the perform
stuff too.)
- How much value is there in supporting
prev
,first
, andlast
links? The spec mentions em, but in practice, I'm not too concerned about them. - Where should
iter_pages()
live?- In order to implement the above
perform
changes, those methods only have access to a FHIRServer, not a FHIRClient. So at least some of these helper methods you wrote should maybe move off of FHIRClient and into a shared_utils.py
file or something? - As an aside, I feel like the boundary and scope of responsibility between FHIRServer and FHIRClient is too weak.
- I like the idea of throwing it into Bundle itself... we could add the methods into the jinja template code if the current resource is a Bundle.
- FHIRClient is fine too, but as above, we might need to move some of the support code elsewhere so FHIRSearch can access it.
- In order to implement the above
I'm not asking you to do any of the above work, I'm just nattering about how we want this to look. I can do the perform
changes and move stuff around as we like. I'm just curious about your opinion on it and trying to get consensus on what kind of API we all like. I'll tag in @dogversioning too for API opinions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any qualms with the 4.x *_iter
/5.x cutover approach - I buy 'just make it an iterable' as a reasonable modern python upgrade.
Re: prev, first, last, I would probably advocate for supporting what the spec suggests, regardless of what we think is useful.
I sort of like a shared utils lib vs direct injection, but i can see the case for the latter too.
This is probably almost already at the point where this discussion could go to a ticket - good for visibility here, but that's probably a better long term repository for this information (and a better place to solicit opinions from others, if there are any).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a great plan. I agree that embedding pagination functionality within the Bundle class could streamline the API and make it more intuitive. It seems like this might be a good candidate for a separate ticket to ensure it gets the attention it deserves. I’d be happy to create the ticket and contribute to the changes under your guidance if that works for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I did summarize the 4.x discussion in a ticket, #172
if not parsed_url.netloc: | ||
raise ValueError("Invalid URL domain in `next` link.") | ||
|
||
# Additional sanitization if necessary, e.g., removing dangerous query parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of parameters are you thinking about here? I'm all for being cautious, but just curious. These links are in theory coming from the server (but of course, can be crafted directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking along the lines of preventing injection attacks. While these links are expected to come from the server, they could be crafted manually. Adding an extra layer of sanitization helps ensure that only safe, expected parameters are processed. However, I'm open to suggestions if this feels like overkill for our use case.
return next_bundle | ||
|
||
except requests.exceptions.HTTPError as e: | ||
# Handle specific HTTP errors as needed, possibly including retry logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting and good note - maybe once this lands, we can make an issue tracking this improvement. Though probably... we'd want to move that into the client/server code transparently. So maybe the ticket could be broader than this one spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that implementing HTTP error handling and retry logic in a more centralized way across the client/server code could be a valuable improvement. Once this lands, I can create an issue to track this broader enhancement, ensuring that error handling is more consistent and robust throughout the codebase. Thanks for the suggestion!
fhirclient/client.py
Outdated
next_link = self.get_next_link(bundle) | ||
if next_link: | ||
sanitized_next_link = self.sanitize_next_link(next_link) | ||
return self.execute_pagination_request(sanitized_next_link) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your current API is good! But I also want to natter a bit about "how would we change this if we were allowing ourselves to break compatibility, then work backward from there" - so that we can work towards the API we want to have in 5.x.
So Bundles largely come up in search operations (though not only). And the main place we deal with them is FHIRSearch, which has two touch points: perform()
(performs a search and returns a Bundle) and perform_resources()
(performs a search and abstracts the Bundle pieces).
I think in my personal dream 5.x scenario:
perform()
becomes a Bundle generator. (Because, to my understanding, any search can yield multiple Bundles, at server discretion - so API wise, there's not really a need for a single-bundleperform
)perform_resources()
silently adds pagination support behind the scenes as a generator.- Some way to navigate Bundle links manually in case you got you Bundle a different way (like a history operation, or just a Bundle read from disk or something).
- Maybe your
iter_pages()
call. - Maybe
next()
,prev()
,first()
, andlast()
calls layered into the Bundle resource.
- Maybe your
For the current 4.x timeline, to avoid breaking API for the above, I might propose we add:
perform_iter()
(generator for Bundles)perform_resources_iter()
(generator for Resources)- And then for 5.x, just replace the normal method names with these two above implementations (and maybe make
_iter
an alias).
- And then for 5.x, just replace the normal method names with these two above implementations (and maybe make
- Update
perform_resources()
to follow Bundle links transparently (and return all resources found as one giant list) - The manual Bundle link solution.
I'm hoping that my perform
ideas aren't too controversial. And we can focus our bike-shedding on the manual link API a bit more. (But I welcome push-back on the perform
stuff too.)
- How much value is there in supporting
prev
,first
, andlast
links? The spec mentions em, but in practice, I'm not too concerned about them. - Where should
iter_pages()
live?- In order to implement the above
perform
changes, those methods only have access to a FHIRServer, not a FHIRClient. So at least some of these helper methods you wrote should maybe move off of FHIRClient and into a shared_utils.py
file or something? - As an aside, I feel like the boundary and scope of responsibility between FHIRServer and FHIRClient is too weak.
- I like the idea of throwing it into Bundle itself... we could add the methods into the jinja template code if the current resource is a Bundle.
- FHIRClient is fine too, but as above, we might need to move some of the support code elsewhere so FHIRSearch can access it.
- In order to implement the above
I'm not asking you to do any of the above work, I'm just nattering about how we want this to look. I can do the perform
changes and move stuff around as we like. I'm just curious about your opinion on it and trying to get consensus on what kind of API we all like. I'll tag in @dogversioning too for API opinions.
PR Title: Add Pagination to FHIRClient with Unit Tests
Description:
This PR introduces pagination to
FHIRClient
, allowing efficient navigation of large FHIR datasets. Key methods include:fetch_next_page
: Retrieves the next page via thenext
link in the FHIR Bundle.get_next_link
: Extracts thenext
link from the Bundle's links.sanitize_next_link
: Validates and sanitizes thenext
link URL.execute_pagination_request
: Handles the HTTP request and manages errors when fetching the next page.Testing:
client_pagination_test.py
cover pagination, handling missingnext
links, and errors.Notes:
Please review and provide feedback.
Fixes: #108