Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate household endpoints to new API structure #2038

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

anth-volk
Copy link
Collaborator

@anth-volk anth-volk commented Dec 4, 2024

Fixes #1988.
Fixes #2034.

This PR migrates the household endpoints to the new API structure.


@household_bp.route("/<household_id>", methods=["GET"])
@validate_country
def get_household(country_id: str, household_id: str) -> Response:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion, blocking I don't like that country_id is an implicit requirement that has to be fulfilled when you attach the blueprint by adding it to the prefix. Honestly, I'm surprised that works.

I would expect you to need to explicitly declare the URL parameters you need in the blueprint some way.

  • question: I'm not sure what pyflask intends you to do here... the documentation is pretty thin. This feels like a side effect, not a feature though. Do you have a better reference?
  • suggested immediate solution: With that caveat, I suggest just using the full path here: "/<country_id>/<household_id>" and then attaching the blueprint to the "/" app.
    • This is explicit (you can see where country_id is coming from and can't leave it out) - yay
    • This means you now register overlapping blueprints for anything that needs "country_id". If you accidentally define two with the same structure, pyflask says nothing and just uses whatever gets registered second - boo.
  • discussion for next iteration I wonder if this is telling us we should flatten our REST API structure. Nesting tends to be an anti-pattern.
    • In this case you could have /countries/<country_id> and /households/<household_id>. where household_id is globally unique. After all.. we know which country the household is in, why are we making the customer remember and restate it?
    • This addresses the current problem, but generally this makes for cleaner and unambigous URLs: is the path "/one/two" referring to a country "one" and a household with ID "two", a top level resource "one" and and a sub resource called "two", or a country called "one" for which I want the sub resource called "two
    • Which in turn solves the problem above of overlapping paths (not coincidentally)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from, and you're right, the documentation on this usage is pretty thin. I would not be against migrating to the format you've described. I think that no matter what format we choose, we're operating in a less than ideal environment, since literally every route requires a country ID value.

I also agree that there's probably a benefit to flattening the API structure overall in the long-run. However, this introduces a couple of important questions. For example, we have one or two triple-nested endpoints (e.g., <country_id>/household/<household_id>/over/policy/<policy_id> or something similar), and we'd have to consider a proper implementation for them.

Additionally, this form of migration is obviously breaking, and because the API wasn't designed with a version marker in its route, our only way of migrating is inherently breaking - we must make changes to the API, break it temporarily, then update the front-end app.

Finally, there are cases of routes where we definitely need the country ID. Metadata is obviously one, but that's not difficult to implement within a flattened API structure. The harder-to-implement routes are those enqueuing society-wide calculation.

My proposed solution here would be to shift to the format you've recommended (and also update the other blueprints added recently for AI and economy endpoints), then engage in a longer discussion regarding flattening the structure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR

agree.

Less than ideal paths

Yeah. As structured now, it's awkward.

flattening that example

Yeah I mean you could go a bunch of ways there.

The Computation is the Entity

IMO we're modeling a request to compute a value based on inputs and modeling this as a POST to a computations entity actually ends up being a lot more flexible/useful than treating this as a child of a household and a policy.

Create a computation

POST /computations/
{
  household_id=...
  policy_id=...
}

returns
a link to /computations/[computation_id]

GET the result

GET
/computations/[computation_id]
{
    status="..."
    inputs= {
       household
       policy
       ...
    }
    data={...}
}

Caching
Post simply redirects to an existing computation instead of creating a new one.

In fact the "computation_id" could just be an amalgam of the input values (I.e. [household_id]&[policy_id]) in which case it's easy to map to a cached value.

Searching

GET /computations?country=...
GET /computations?policy_id=...
GET /computations?household_id=...

Pros:

  • Puts the one true place for finding computations of all kinds under /computations
    • on the implementation side I don't need to implement partial queries under household, population, country, etc.
    • on the client side I can ask global questions with one query in one place.
  • Simplifies different ways of asking for the same thing: I can easily support both household ID and direct data as input for household without needing another path, just one POST with flexible input.
  • Can all go into a completely encapsulated blueprint that lives under an unambiguous URL.

Cons:

  • Client slightly more complicated in that we have POST then GET poll instead of just poll.

The household (or the policy, or both) is the entity

you could do /households/[id]/computations and then follow the same model as above

This is less flexible though because

  1. I have to have a saved household with a household_id to run the computation (what if I just want to pass the household data directly?)
  2. If I want to search for computations across households I have to support that with another endpoint under a different resource.
  3. I probably still want to be able to search for computations under the household so now I have to duplicate some of that global search under the household.

Also.. why is the household the entity instead of the policy? I mean it seems reasonable to want to search for all simulations under a policy right? So now do I implement both or pick one, etc?

policyengine_api/routes/household_routes.py Show resolved Hide resolved
policyengine_api/routes/household_routes.py Show resolved Hide resolved

# Ensure that household ID is a number
try:
household_id = int(household_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion/nonblocking I can't find in the flask docs where this is documented (!) but apparently you can use "int:household_id" in the path to do basic validation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that would make life easier. Let me see if I can find in logs, curious to see if there are more advanced features so that we could further cut down on validation code.

)
else:
return Response(
json.dumps(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question why does the metadata endpoint return just a dict while others return {status,message,result}? are we trying to align on one standard, which one and why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've ever had a standard here - definitely time for one. Let me work up what I think makes sense, gather your feedback, then implement.

)


class TestGetHousehold:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/nonblocking: I have a mild preference for testing the blueprint (I.e. specifically wrap the blueprint in an app and test it) instead of the full app for all tests. Why tightly couple all that stuff if you don't need to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Previously, we had no blueprints, and some of our earliest tests were merely reactions to particular calculations failing, and thus were essentially targeted e2e's or integration tests. Moving forward, I think that would make a lot of sense.

Comment on lines +61 to +63
mock_row = MagicMock(spec=LegacyRow)
mock_row.__getitem__.side_effect = lambda x: {"id": 1}[x]
mock_database.query().fetchone.return_value = mock_row
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitt/nonblocking I tend to prefer a test support method (I would usually call this a fixture, but that may have specific technical implications here) like given_any_query_will_return(...) instead of mock-object-ese in the test. It makes for a cleaner read and if you change the way we query the DB you just need to change the function once, not every build test we wrote.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to chat with you about that. I'm new to the world of mock objects and fixtures, frankly, so curious to hear more during our meeting Wednesday.

assert b"Label must be a string or None" in response.data


class TestUpdateHousehold:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussion/non-blocking I like the way you have organized these tests with a class per operation and clearly named tests.

Have you looked at pytest-describe or similar ways of removing some of the boilerplate/repetitive aspects of the test syntax (I haven't used that specifically, but I've used similar frameworks in javascript and ruby)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not, but will

Comment on lines +132 to +134
assert response.status_code == 200
assert data["status"] == "ok"
assert data["result"]["household_id"] == "1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/maybe blocking Does this actually verify an update happened? I'm expecting a verify of some kind confirming the SQL update.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only in the sense that the API returns a 200 when an update occurs, while it returns a 201 when record creation occurs, but you're right, this test lacks checking beyond that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion validate the query unless it's a ton of work to do.



# Service level tests
class TestHouseholdService:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question/nitt/nonblocking

The nitt - For the most part in most code bases there is one file per class under test. Usually it's under a directory matching the class file.

I would for that reason expect this to live in something like tests/.../services/test_household_service.py

I personally find this convention a bit confusing because I need more context to know where to look to find the appropriate test.

The question - do you currently have a convention this fits neatly into and/or do Python projects do that differently?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR open to begin to migrate to a convention, because I agree, this method is basically a free-for-all. That said, looking forward to discussing it with you Wednesday.

@anth-volk anth-volk changed the title Migrate household endpoints to new API structure Migrate household endpoints to new API structure Dec 6, 2024
@anth-volk
Copy link
Collaborator Author

Hey Mike,

Thanks again for your review on this. Here's what I'd like to propose as a roadmap forward for both this PR and the PR migrating policy endpoints in a similar way (currently open as #2025). Feel free to let me know what you think, then I can proceed:

  • Immediate, as part of PR
    • Bug fixes raised as part of review
    • Migrate to specifying full endpoint address upon registration (as suggested by Mike)
      • Update PR to apply this structure to household endpoints
      • Update policy endpoint PR to also apply this structure
      • Also apply this structure to previously migrated endpoints (economy, AI)
    • Look into basic Flask validators and apply where necessary
    • Create generic exception handlers and use those in place of manually specified error responses (assuming we can easily isolate standard errors; if not, move to “In future”)
  • In progress elsewhere
    • Keep existing print statements, and then migrate these to log statements in logging PR (currently in progress, but no PR open)
    • Test improvements (see Improve testing #2021)
  • In near future (i.e., PR following this and policy endpoint refactor)
    • Chat further about improvements to testing
    • Create a clearer specification regarding keys like “status”, “message”, “result” – may require coordination with relevant stakeholders (e.g., Nikhil)
    • Look into implementing Marshmallow schema validation
  • In longer-term future
    • Discuss how to flatten API structure, especially multi-nested endpoints

@mikesmit
Copy link
Collaborator

mikesmit commented Dec 7, 2024

I approve of your plan. I have added some more discussion above, but I don't want to block this work on it.

@anth-volk anth-volk force-pushed the fix/1988-migrate-household branch from eb430a1 to a3a78fc Compare December 11, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants