Add pagination functionality to FHIRClient and related unit tests #169

LanaNYC · 2024-08-06T19:05:14Z

PR Title: Add Pagination to FHIRClient with Unit Tests

Description:

This PR introduces pagination to FHIRClient, allowing efficient navigation of large FHIR datasets. Key methods include:

fetch_next_page: Retrieves the next page via the next link in the FHIR Bundle.
get_next_link: Extracts the next link from the Bundle's links.
sanitize_next_link: Validates and sanitizes the next link URL.
execute_pagination_request: Handles the HTTP request and manages errors when fetching the next page.

Testing:

Unit tests in client_pagination_test.py cover pagination, handling missing next links, and errors.

Notes:

Backward-compatible, no breaking changes.

Please review and provide feedback.

Fixes: #108

mikix

Thank you so much! This is a nice idea. A couple quick comments, did not pore over it

fhirclient/client.py

mikix · 2024-08-06T19:58:24Z

fhirclient/client.py

+        next_link = self.get_next_link(bundle)
+        if next_link:
+            sanitized_next_link = self.sanitize_next_link(next_link)
+            return self.execute_pagination_request(sanitized_next_link)
+        return None


This isn't a bad approach, but feels like it's so close to being a really Pythonic iter() pattern. I don't have a strong preference, but curious if you considered that approach. Not sure what the API would look like for that. Maybe something like?

for bundle in client.iter_pages(first_bundle):

Not sure I love that either, but the current use wouldn't be too far off:

bundle = first_bundle while bundle := client.fetch_next_page(bundle):

There are also a lot of ways to do the iteration approach. Off the top of my head:

We could add __iter__ and __next__ methods to the Bundle object directly...

We could add FHIRSearch.perform_iter() as a generator maybe that yields Bundle objects...

Or the above FHIRClient.iter_pages() approach.

Speaking of which... FHIRSearch.perform_resources() should probably iterate through the Bundles behind the scenes, yeah?

Would that functionality diminish the need for a separate pagination API?

It currently says it returns a list... But doesn't actually annotate anything. I wonder if it would be an acceptable change to make it a generator, so that all the results don't need to be in memory at once. Probably not. I can see some existing consumers getting bit by that, but I also don't love the current API - especially if we're downloading multiple Bundles at once. Maybe if we want to do the memory-kind approach, we'd need to add a separate API call like perform_resources_iter() until we break API and can replace perform_resources

Thank you very much for all your ideas and comments. The latest changes are pushed.
Since backward compatibility is a concern, I kept all changes in FHIRClient class. If existing code relies on Bundle objects being iterated over using traditional loops or other methods that expect a list-like behavior, adding iter and next might interfere with these expectations. Users might also have existing methods that do not anticipate Bundle objects to be iterable in this specific way. Please let me know if you disagree and want me to change code further.

Your current API is good! But I also want to natter a bit about "how would we change this if we were allowing ourselves to break compatibility, then work backward from there" - so that we can work towards the API we want to have in 5.x.

So Bundles largely come up in search operations (though not only). And the main place we deal with them is FHIRSearch, which has two touch points: perform() (performs a search and returns a Bundle) and perform_resources() (performs a search and abstracts the Bundle pieces).

I think in my personal dream 5.x scenario:

perform() becomes a Bundle generator. (Because, to my understanding, any search can yield multiple Bundles, at server discretion - so API wise, there's not really a need for a single-bundle perform)

perform_resources() silently adds pagination support behind the scenes as a generator.

Some way to navigate Bundle links manually in case you got you Bundle a different way (like a history operation, or just a Bundle read from disk or something).

Maybe your iter_pages() call.

Maybe next(), prev(), first(), and last() calls layered into the Bundle resource.

For the current 4.x timeline, to avoid breaking API for the above, I might propose we add:

perform_iter() (generator for Bundles)

perform_resources_iter() (generator for Resources)

And then for 5.x, just replace the normal method names with these two above implementations (and maybe make _iter an alias).

Update perform_resources() to follow Bundle links transparently (and return all resources found as one giant list)

The manual Bundle link solution.

I'm hoping that my perform ideas aren't too controversial. And we can focus our bike-shedding on the manual link API a bit more. (But I welcome push-back on the perform stuff too.)

How much value is there in supporting prev, first, and last links? The spec mentions em, but in practice, I'm not too concerned about them.

Where should iter_pages() live?

In order to implement the above perform changes, those methods only have access to a FHIRServer, not a FHIRClient. So at least some of these helper methods you wrote should maybe move off of FHIRClient and into a shared _utils.py file or something?

As an aside, I feel like the boundary and scope of responsibility between FHIRServer and FHIRClient is too weak.

I like the idea of throwing it into Bundle itself... we could add the methods into the jinja template code if the current resource is a Bundle.

FHIRClient is fine too, but as above, we might need to move some of the support code elsewhere so FHIRSearch can access it.

I'm not asking you to do any of the above work, I'm just nattering about how we want this to look. I can do the perform changes and move stuff around as we like. I'm just curious about your opinion on it and trying to get consensus on what kind of API we all like. I'll tag in @dogversioning too for API opinions.

I don't have any qualms with the 4.x *_iter/5.x cutover approach - I buy 'just make it an iterable' as a reasonable modern python upgrade.

Re: prev, first, last, I would probably advocate for supporting what the spec suggests, regardless of what we think is useful.

I sort of like a shared utils lib vs direct injection, but i can see the case for the latter too.

This is probably almost already at the point where this discussion could go to a ticket - good for visibility here, but that's probably a better long term repository for this information (and a better place to solicit opinions from others, if there are any).

That sounds like a great plan. I agree that embedding pagination functionality within the Bundle class could streamline the API and make it more intuitive. It seems like this might be a good candidate for a separate ticket to ensure it gets the attention it deserves. I’d be happy to create the ticket and contribute to the changes under your guidance if that works for you.

FYI: I did summarize the 4.x discussion in a ticket, #172

fhirclient/client.py

mikix · 2024-08-08T13:15:53Z

fhirclient/client.py

+        if not parsed_url.netloc:
+            raise ValueError("Invalid URL domain in `next` link.")
+
+        # Additional sanitization if necessary, e.g., removing dangerous query parameters


What kind of parameters are you thinking about here? I'm all for being cautious, but just curious. These links are in theory coming from the server (but of course, can be crafted directly).

I was thinking along the lines of preventing injection attacks. While these links are expected to come from the server, they could be crafted manually. Adding an extra layer of sanitization helps ensure that only safe, expected parameters are processed. However, I'm open to suggestions if this feels like overkill for our use case.

mikix · 2024-08-08T13:42:16Z

fhirclient/client.py

+            return next_bundle
+
+        except requests.exceptions.HTTPError as e:
+            # Handle specific HTTP errors as needed, possibly including retry logic


This is an interesting and good note - maybe once this lands, we can make an issue tracking this improvement. Though probably... we'd want to move that into the client/server code transparently. So maybe the ticket could be broader than this one spot.

I agree that implementing HTTP error handling and retry logic in a more centralized way across the client/server code could be a valuable improvement. Once this lands, I can create an issue to track this broader enhancement, ensuring that error handling is more consistent and robust throughout the codebase. Thanks for the suggestion!

mikix · 2024-08-08T14:02:02Z

fhirclient/client.py

+        next_link = self.get_next_link(bundle)
+        if next_link:
+            sanitized_next_link = self.sanitize_next_link(next_link)
+            return self.execute_pagination_request(sanitized_next_link)
+        return None


Your current API is good! But I also want to natter a bit about "how would we change this if we were allowing ourselves to break compatibility, then work backward from there" - so that we can work towards the API we want to have in 5.x.

So Bundles largely come up in search operations (though not only). And the main place we deal with them is FHIRSearch, which has two touch points: perform() (performs a search and returns a Bundle) and perform_resources() (performs a search and abstracts the Bundle pieces).

I think in my personal dream 5.x scenario:

perform() becomes a Bundle generator. (Because, to my understanding, any search can yield multiple Bundles, at server discretion - so API wise, there's not really a need for a single-bundle perform)

perform_resources() silently adds pagination support behind the scenes as a generator.

Some way to navigate Bundle links manually in case you got you Bundle a different way (like a history operation, or just a Bundle read from disk or something).

Maybe your iter_pages() call.

Maybe next(), prev(), first(), and last() calls layered into the Bundle resource.

For the current 4.x timeline, to avoid breaking API for the above, I might propose we add:

perform_iter() (generator for Bundles)

perform_resources_iter() (generator for Resources)

And then for 5.x, just replace the normal method names with these two above implementations (and maybe make _iter an alias).

Update perform_resources() to follow Bundle links transparently (and return all resources found as one giant list)

The manual Bundle link solution.

I'm hoping that my perform ideas aren't too controversial. And we can focus our bike-shedding on the manual link API a bit more. (But I welcome push-back on the perform stuff too.)

How much value is there in supporting prev, first, and last links? The spec mentions em, but in practice, I'm not too concerned about them.

Where should iter_pages() live?

In order to implement the above perform changes, those methods only have access to a FHIRServer, not a FHIRClient. So at least some of these helper methods you wrote should maybe move off of FHIRClient and into a shared _utils.py file or something?

As an aside, I feel like the boundary and scope of responsibility between FHIRServer and FHIRClient is too weak.

I like the idea of throwing it into Bundle itself... we could add the methods into the jinja template code if the current resource is a Bundle.

FHIRClient is fine too, but as above, we might need to move some of the support code elsewhere so FHIRSearch can access it.

I'm not asking you to do any of the above work, I'm just nattering about how we want this to look. I can do the perform changes and move stuff around as we like. I'm just curious about your opinion on it and trying to get consensus on what kind of API we all like. I'll tag in @dogversioning too for API opinions.

fhirclient/client.py

mikix reviewed Aug 6, 2024

View reviewed changes

LanaNYC force-pushed the pagination branch from 1d9f8f6 to f954350 Compare August 7, 2024 21:53

mikix reviewed Aug 8, 2024

View reviewed changes

fhirclient/client.py Outdated Show resolved Hide resolved

LanaNYC force-pushed the pagination branch from f954350 to 030ca54 Compare August 12, 2024 20:19

dogversioning reviewed Aug 13, 2024

View reviewed changes

fhirclient/client.py Outdated Show resolved Hide resolved

dogversioning mentioned this pull request Aug 13, 2024

package level imports causing failures during pip install #171

Open

Add pagination functionality to FHIRClient and related unit tests

5facb52

LanaNYC force-pushed the pagination branch from 030ca54 to 5facb52 Compare August 13, 2024 18:15

dogversioning approved these changes Aug 13, 2024

View reviewed changes

dogversioning merged commit 0a82692 into smart-on-fhir:main Aug 13, 2024
5 checks passed

dogversioning mentioned this pull request Aug 13, 2024

Iterable support in FHIRSearch #172

Closed

bwalsh mentioned this pull request Aug 13, 2024

Pagination: Evaluate latest PR for smart on fhir dependency FHIR-Aggregator/client#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pagination functionality to FHIRClient and related unit tests #169

Add pagination functionality to FHIRClient and related unit tests #169

LanaNYC commented Aug 6, 2024 •

edited by mikix

Loading

mikix left a comment

mikix Aug 6, 2024

mikix Aug 7, 2024 •

edited

Loading

mikix Aug 7, 2024 •

edited

Loading

LanaNYC Aug 7, 2024

mikix Aug 8, 2024 •

edited

Loading

dogversioning Aug 8, 2024

LanaNYC Aug 12, 2024

dogversioning Aug 13, 2024

mikix Aug 8, 2024

LanaNYC Aug 12, 2024

mikix Aug 8, 2024

LanaNYC Aug 12, 2024

mikix Aug 8, 2024 •

edited

Loading

Add pagination functionality to FHIRClient and related unit tests #169

Add pagination functionality to FHIRClient and related unit tests #169

Conversation

LanaNYC commented Aug 6, 2024 • edited by mikix Loading

PR Title: Add Pagination to FHIRClient with Unit Tests

mikix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

mikix Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

LanaNYC commented Aug 6, 2024 •

edited by mikix

Loading

mikix Aug 7, 2024 •

edited

Loading

mikix Aug 7, 2024 •

edited

Loading

mikix Aug 8, 2024 •

edited

Loading

mikix Aug 8, 2024 •

edited

Loading