Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination functionality to FHIRClient and related unit tests #169

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

LanaNYC
Copy link
Contributor

@LanaNYC LanaNYC commented Aug 6, 2024

PR Title: Add Pagination to FHIRClient with Unit Tests

Description:

This PR introduces pagination to FHIRClient, allowing efficient navigation of large FHIR datasets. Key methods include:

  • fetch_next_page: Retrieves the next page via the next link in the FHIR Bundle.
  • get_next_link: Extracts the next link from the Bundle's links.
  • sanitize_next_link: Validates and sanitizes the next link URL.
  • execute_pagination_request: Handles the HTTP request and manages errors when fetching the next page.

Testing:

  • Unit tests in client_pagination_test.py cover pagination, handling missing next links, and errors.

Notes:

  • Backward-compatible, no breaking changes.

Please review and provide feedback.

Fixes: #108

Copy link
Contributor

@mikix mikix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much! This is a nice idea. A couple quick comments, did not pore over it

fhirclient/client.py Outdated Show resolved Hide resolved
Comment on lines 257 to 263
next_link = self.get_next_link(bundle)
if next_link:
sanitized_next_link = self.sanitize_next_link(next_link)
return self.execute_pagination_request(sanitized_next_link)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a bad approach, but feels like it's so close to being a really Pythonic iter() pattern. I don't have a strong preference, but curious if you considered that approach. Not sure what the API would look like for that. Maybe something like?

for bundle in client.iter_pages(first_bundle):

Not sure I love that either, but the current use wouldn't be too far off:

bundle = first_bundle
while bundle := client.fetch_next_page(bundle):

Copy link
Contributor

@mikix mikix Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are also a lot of ways to do the iteration approach. Off the top of my head:

  • We could add __iter__ and __next__ methods to the Bundle object directly...
  • We could add FHIRSearch.perform_iter() as a generator maybe that yields Bundle objects...
  • Or the above FHIRClient.iter_pages() approach.

Copy link
Contributor

@mikix mikix Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of which... FHIRSearch.perform_resources() should probably iterate through the Bundles behind the scenes, yeah?

Would that functionality diminish the need for a separate pagination API?

It currently says it returns a list... But doesn't actually annotate anything. I wonder if it would be an acceptable change to make it a generator, so that all the results don't need to be in memory at once. Probably not. I can see some existing consumers getting bit by that, but I also don't love the current API - especially if we're downloading multiple Bundles at once. Maybe if we want to do the memory-kind approach, we'd need to add a separate API call like perform_resources_iter() until we break API and can replace perform_resources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for all your ideas and comments. The latest changes are pushed.
Since backward compatibility is a concern, I kept all changes in FHIRClient class. If existing code relies on Bundle objects being iterated over using traditional loops or other methods that expect a list-like behavior, adding iter and next might interfere with these expectations. Users might also have existing methods that do not anticipate Bundle objects to be iterable in this specific way. Please let me know if you disagree and want me to change code further.

Copy link
Contributor

@mikix mikix Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your current API is good! But I also want to natter a bit about "how would we change this if we were allowing ourselves to break compatibility, then work backward from there" - so that we can work towards the API we want to have in 5.x.

So Bundles largely come up in search operations (though not only). And the main place we deal with them is FHIRSearch, which has two touch points: perform() (performs a search and returns a Bundle) and perform_resources() (performs a search and abstracts the Bundle pieces).

I think in my personal dream 5.x scenario:

  • perform() becomes a Bundle generator. (Because, to my understanding, any search can yield multiple Bundles, at server discretion - so API wise, there's not really a need for a single-bundle perform)
  • perform_resources() silently adds pagination support behind the scenes as a generator.
  • Some way to navigate Bundle links manually in case you got you Bundle a different way (like a history operation, or just a Bundle read from disk or something).
    • Maybe your iter_pages() call.
    • Maybe next(), prev(), first(), and last() calls layered into the Bundle resource.

For the current 4.x timeline, to avoid breaking API for the above, I might propose we add:

  • perform_iter() (generator for Bundles)
  • perform_resources_iter() (generator for Resources)
    • And then for 5.x, just replace the normal method names with these two above implementations (and maybe make _iter an alias).
  • Update perform_resources() to follow Bundle links transparently (and return all resources found as one giant list)
  • The manual Bundle link solution.

I'm hoping that my perform ideas aren't too controversial. And we can focus our bike-shedding on the manual link API a bit more. (But I welcome push-back on the perform stuff too.)

  • How much value is there in supporting prev, first, and last links? The spec mentions em, but in practice, I'm not too concerned about them.
  • Where should iter_pages() live?
    • In order to implement the above perform changes, those methods only have access to a FHIRServer, not a FHIRClient. So at least some of these helper methods you wrote should maybe move off of FHIRClient and into a shared _utils.py file or something?
    • As an aside, I feel like the boundary and scope of responsibility between FHIRServer and FHIRClient is too weak.
    • I like the idea of throwing it into Bundle itself... we could add the methods into the jinja template code if the current resource is a Bundle.
    • FHIRClient is fine too, but as above, we might need to move some of the support code elsewhere so FHIRSearch can access it.

I'm not asking you to do any of the above work, I'm just nattering about how we want this to look. I can do the perform changes and move stuff around as we like. I'm just curious about your opinion on it and trying to get consensus on what kind of API we all like. I'll tag in @dogversioning too for API opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any qualms with the 4.x *_iter/5.x cutover approach - I buy 'just make it an iterable' as a reasonable modern python upgrade.

Re: prev, first, last, I would probably advocate for supporting what the spec suggests, regardless of what we think is useful.

I sort of like a shared utils lib vs direct injection, but i can see the case for the latter too.

This is probably almost already at the point where this discussion could go to a ticket - good for visibility here, but that's probably a better long term repository for this information (and a better place to solicit opinions from others, if there are any).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a great plan. I agree that embedding pagination functionality within the Bundle class could streamline the API and make it more intuitive. It seems like this might be a good candidate for a separate ticket to ensure it gets the attention it deserves. I’d be happy to create the ticket and contribute to the changes under your guidance if that works for you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I did summarize the 4.x discussion in a ticket, #172

fhirclient/client.py Outdated Show resolved Hide resolved
fhirclient/client.py Outdated Show resolved Hide resolved
fhirclient/client.py Outdated Show resolved Hide resolved
if not parsed_url.netloc:
raise ValueError("Invalid URL domain in `next` link.")

# Additional sanitization if necessary, e.g., removing dangerous query parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of parameters are you thinking about here? I'm all for being cautious, but just curious. These links are in theory coming from the server (but of course, can be crafted directly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of preventing injection attacks. While these links are expected to come from the server, they could be crafted manually. Adding an extra layer of sanitization helps ensure that only safe, expected parameters are processed. However, I'm open to suggestions if this feels like overkill for our use case.

return next_bundle

except requests.exceptions.HTTPError as e:
# Handle specific HTTP errors as needed, possibly including retry logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting and good note - maybe once this lands, we can make an issue tracking this improvement. Though probably... we'd want to move that into the client/server code transparently. So maybe the ticket could be broader than this one spot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that implementing HTTP error handling and retry logic in a more centralized way across the client/server code could be a valuable improvement. Once this lands, I can create an issue to track this broader enhancement, ensuring that error handling is more consistent and robust throughout the codebase. Thanks for the suggestion!

Comment on lines 257 to 263
next_link = self.get_next_link(bundle)
if next_link:
sanitized_next_link = self.sanitize_next_link(next_link)
return self.execute_pagination_request(sanitized_next_link)
return None
Copy link
Contributor

@mikix mikix Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your current API is good! But I also want to natter a bit about "how would we change this if we were allowing ourselves to break compatibility, then work backward from there" - so that we can work towards the API we want to have in 5.x.

So Bundles largely come up in search operations (though not only). And the main place we deal with them is FHIRSearch, which has two touch points: perform() (performs a search and returns a Bundle) and perform_resources() (performs a search and abstracts the Bundle pieces).

I think in my personal dream 5.x scenario:

  • perform() becomes a Bundle generator. (Because, to my understanding, any search can yield multiple Bundles, at server discretion - so API wise, there's not really a need for a single-bundle perform)
  • perform_resources() silently adds pagination support behind the scenes as a generator.
  • Some way to navigate Bundle links manually in case you got you Bundle a different way (like a history operation, or just a Bundle read from disk or something).
    • Maybe your iter_pages() call.
    • Maybe next(), prev(), first(), and last() calls layered into the Bundle resource.

For the current 4.x timeline, to avoid breaking API for the above, I might propose we add:

  • perform_iter() (generator for Bundles)
  • perform_resources_iter() (generator for Resources)
    • And then for 5.x, just replace the normal method names with these two above implementations (and maybe make _iter an alias).
  • Update perform_resources() to follow Bundle links transparently (and return all resources found as one giant list)
  • The manual Bundle link solution.

I'm hoping that my perform ideas aren't too controversial. And we can focus our bike-shedding on the manual link API a bit more. (But I welcome push-back on the perform stuff too.)

  • How much value is there in supporting prev, first, and last links? The spec mentions em, but in practice, I'm not too concerned about them.
  • Where should iter_pages() live?
    • In order to implement the above perform changes, those methods only have access to a FHIRServer, not a FHIRClient. So at least some of these helper methods you wrote should maybe move off of FHIRClient and into a shared _utils.py file or something?
    • As an aside, I feel like the boundary and scope of responsibility between FHIRServer and FHIRClient is too weak.
    • I like the idea of throwing it into Bundle itself... we could add the methods into the jinja template code if the current resource is a Bundle.
    • FHIRClient is fine too, but as above, we might need to move some of the support code elsewhere so FHIRSearch can access it.

I'm not asking you to do any of the above work, I'm just nattering about how we want this to look. I can do the perform changes and move stuff around as we like. I'm just curious about your opinion on it and trying to get consensus on what kind of API we all like. I'll tag in @dogversioning too for API opinions.

fhirclient/client.py Outdated Show resolved Hide resolved
fhirclient/client.py Outdated Show resolved Hide resolved
@dogversioning dogversioning merged commit 0a82692 into smart-on-fhir:main Aug 13, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pagination support
3 participants