Migrate `household` endpoints to new API structure #2038

anth-volk · 2024-12-04T00:14:22Z

Fixes #1988.
Fixes #2034.

This PR migrates the household endpoints to the new API structure.

mikesmit · 2024-12-04T21:26:58Z

policyengine_api/routes/household_routes.py

+
+@household_bp.route("/<household_id>", methods=["GET"])
+@validate_country
+def get_household(country_id: str, household_id: str) -> Response:


suggestion, blocking I don't like that country_id is an implicit requirement that has to be fulfilled when you attach the blueprint by adding it to the prefix. Honestly, I'm surprised that works.

I would expect you to need to explicitly declare the URL parameters you need in the blueprint some way.

question: I'm not sure what pyflask intends you to do here... the documentation is pretty thin. This feels like a side effect, not a feature though. Do you have a better reference?

suggested immediate solution: With that caveat, I suggest just using the full path here: "/<country_id>/<household_id>" and then attaching the blueprint to the "/" app.

This is explicit (you can see where country_id is coming from and can't leave it out) - yay

This means you now register overlapping blueprints for anything that needs "country_id". If you accidentally define two with the same structure, pyflask says nothing and just uses whatever gets registered second - boo.

discussion for next iteration I wonder if this is telling us we should flatten our REST API structure. Nesting tends to be an anti-pattern.

In this case you could have /countries/<country_id> and /households/<household_id>. where household_id is globally unique. After all.. we know which country the household is in, why are we making the customer remember and restate it?

This addresses the current problem, but generally this makes for cleaner and unambigous URLs: is the path "/one/two" referring to a country "one" and a household with ID "two", a top level resource "one" and and a sub resource called "two", or a country called "one" for which I want the sub resource called "two

Which in turn solves the problem above of overlapping paths (not coincidentally)

I see where you're coming from, and you're right, the documentation on this usage is pretty thin. I would not be against migrating to the format you've described. I think that no matter what format we choose, we're operating in a less than ideal environment, since literally every route requires a country ID value.

I also agree that there's probably a benefit to flattening the API structure overall in the long-run. However, this introduces a couple of important questions. For example, we have one or two triple-nested endpoints (e.g., <country_id>/household/<household_id>/over/policy/<policy_id> or something similar), and we'd have to consider a proper implementation for them.

Additionally, this form of migration is obviously breaking, and because the API wasn't designed with a version marker in its route, our only way of migrating is inherently breaking - we must make changes to the API, break it temporarily, then update the front-end app.

Finally, there are cases of routes where we definitely need the country ID. Metadata is obviously one, but that's not difficult to implement within a flattened API structure. The harder-to-implement routes are those enqueuing society-wide calculation.

My proposed solution here would be to shift to the format you've recommended (and also update the other blueprints added recently for AI and economy endpoints), then engage in a longer discussion regarding flattening the structure.

TLDR

agree.

Less than ideal paths

Yeah. As structured now, it's awkward.

flattening that example

Yeah I mean you could go a bunch of ways there.

The Computation is the Entity

IMO we're modeling a request to compute a value based on inputs and modeling this as a POST to a computations entity actually ends up being a lot more flexible/useful than treating this as a child of a household and a policy.

Create a computation

POST /computations/ { household_id=... policy_id=... }

returns
a link to /computations/[computation_id]

GET the result

GET /computations/[computation_id] { status="..." inputs= { household policy ... } data={...} }

Caching
Post simply redirects to an existing computation instead of creating a new one.

In fact the "computation_id" could just be an amalgam of the input values (I.e. [household_id]&[policy_id]) in which case it's easy to map to a cached value.

Searching

GET /computations?country=... GET /computations?policy_id=... GET /computations?household_id=...

Pros:

Puts the one true place for finding computations of all kinds under /computations

on the implementation side I don't need to implement partial queries under household, population, country, etc.

on the client side I can ask global questions with one query in one place.

Simplifies different ways of asking for the same thing: I can easily support both household ID and direct data as input for household without needing another path, just one POST with flexible input.

Can all go into a completely encapsulated blueprint that lives under an unambiguous URL.

Cons:

Client slightly more complicated in that we have POST then GET poll instead of just poll.

The household (or the policy, or both) is the entity

you could do /households/[id]/computations and then follow the same model as above

This is less flexible though because

I have to have a saved household with a household_id to run the computation (what if I just want to pass the household data directly?)

If I want to search for computations across households I have to support that with another endpoint under a different resource.

I probably still want to be able to search for computations under the household so now I have to duplicate some of that global search under the household.

Also.. why is the household the entity instead of the policy? I mean it seems reasonable to want to search for all simulations under a policy right? So now do I implement both or pick one, etc?

policyengine_api/routes/household_routes.py

mikesmit · 2024-12-04T21:39:12Z

policyengine_api/routes/household_routes.py

+
+    # Ensure that household ID is a number
+    try:
+        household_id = int(household_id)


suggestion/nonblocking I can't find in the flask docs where this is documented (!) but apparently you can use "int:household_id" in the path to do basic validation.

Wow, that would make life easier. Let me see if I can find in logs, curious to see if there are more advanced features so that we could further cut down on validation code.

mikesmit · 2024-12-04T21:42:03Z

policyengine_api/routes/household_routes.py

+            )
+        else:
+            return Response(
+                json.dumps(


question why does the metadata endpoint return just a dict while others return {status,message,result}? are we trying to align on one standard, which one and why?

I don't think we've ever had a standard here - definitely time for one. Let me work up what I think makes sense, gather your feedback, then implement.

mikesmit · 2024-12-04T22:08:41Z