Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Food database changes #333

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Food database changes #333

wants to merge 1 commit into from

Conversation

lukashroch
Copy link
Collaborator

@lukashroch lukashroch commented Jan 13, 2025

Food database redesign changes - proposal based on workshop.

Local/Global records and inheritance

  • dropped prototype inheritance
  • data back-filled in migration
  • global/local records merged to "local" single record isolated to locale
  • unique food codes kept - still needs to be unique within locale
  • foods_local_lists data dropped since all is local now? not needed? We should probably introduce hidden flag as for categories (I think I saw a ticket too for that)

Local category structure

  • category structure converted to local one - any locale can build own one now

Food attributes

  • dropped and migrated into the tags
  • category traversing can be broken using exclude_tags
  • same as before -> same-as-before tag
  • ready meals -> ready-meal tag
  • use_in_recipes 0 enum -> use-anywhere tag
  • use_in_recipes 1 enum -> use-as-food tag
  • use_in_recipes 2 enum -> use-as-ingredient tag
  • reasonable amount -> DROPPED not used at all?
  • TODO: logic to resolve tags based on cat tree not yet done

Brands

  • migrated to use internal local records

Foods group

  • TODO: should we make these locale specific too? Were global...

System data

  • ranking / submission data still use codes

API

  • admin API endpoints use internal numerical IDs
  • Packager API - can we use internal numerical IDs as well ? @mechkg I haven't looked much into the packager API yet?
  • foods.service / local-foods.service (dtto categories) are largely the same, we might merge it at some point?
  • survey frontend API - keep using locale/food/category codes - issue is that survey schemes / conditions etc use mostly food/cat codes
  • alternatively introduce simple endpoint to query cat/food header to get the ID.

Future work ?

  • introduce composite food list per survey -> be able to pick multiple locales to form food list
  • will need work on food index -> have ranked list and resolve duplicates

@lukashroch lukashroch requested a review from mechkg January 13, 2025 15:10
@mechkg
Copy link
Contributor

mechkg commented Jan 13, 2025

Thank you for writing this up @lukashroch!

This looks great so far.

I wrote this down to make sure we're on the same page about the intent behind these changes, please let me know if there is anything that you see differently or disagree with.

My understanding of the relationship between category and food codes/internal ids is that the codes should still serve as a persistent identifier that can be used to reference foods and categories in a portable manner, across instances and as part of a self-contained food database package.

The internal ids on the other hand are a simple numerical id that is efficient but is only valid within a single Intake24 instance, so in essence the internal id is just a surrogate id for the locale_id/food_code pair, but they are generally interchangeable.

The current code system (before these changes) has the following downsides:

  • The food and category codes must be globally unique, which makes using existing codes, such as those coming from an external database (for example, the French Albane database) impossible in the general case,
  • Versioning is frustrating because the only way to do it at the moment is to prefix the code with the year or version string,
  • The codes are limited to 8 characters making it difficult to produce a unique code from an existing code,
  • The codes were initially limited to a single locale/food list, and were meant to be somewhat human-readable, but this property has been lost.

Since after these changes we can now always scope the food/category codes to a locale, I think we should relax the code uniqueness constraint to only require uniqueness within a locale (eventually the composable "food list", whatever we decide to call it), and we should also make the character limit more generous (at least 36 characters to accomodate UUIDs). This way we can include the original food and category ids from external sources (such as the French Alabane food database) without having to jump hoops making them globally unique and compatible with the Intake24 code system, and also make copies of food lists when required for versioning purposes (NDNS years) or extensions (UK + Gusto, UK + gluten-free foods, UK + ethnic minority foods etc.) without having to change any codes.

If we do this, associated food prompts and any other features that refers to foods or categories (like the sandwich builder) will need to be updated to use internal ids to keep the references simple (otherwise they'd have to be a locale/code pair). The prompt condition editor will also need to be updated to use a friendly UI for searching/picking categories, at the moment it only accepts a code string that has to be typed manually.

The packager API relies heavily on the food/category code concept because it needs to identify foods/categories by their persistent codes to be able to update records without breaking any internal references. It will also need a workaround to import existing packages with separate local/global data as we might need to import some data from old packages at some point (like Danish and Portuguese databases).

So it looks like for some operations there will need to be two API paths, the default one using the internal id and the one using the category/food codes (.../locale/by-code/...).

Ranking/food index code should probably still use codes, because initialising, updating or exporting the ranking data will need to use the persistent codes anyway.

Attributes -> tags conversion looks good, I sent a request to the DA team re the reasonable amount values. If they decide to keep it, we could move them to categories as a nullable field (and drop from foods for simplicity).

Agree that food_local_lists no longer makes sense. If we decide to decouple locales from food lists like we discussed, the hidden food flags will have to be either at the study or study schema level. I think the schema level makes more sense because some prompts are set up with a specific food list/category structure in mind.

@lukashroch
Copy link
Collaborator Author

Great I think this all aligns what I have done so far in the PR!

My understanding of the relationship between category and food codes/internal ids is that the codes should still serve as a persistent identifier that can be used to reference foods and categories in a portable manner, across instances and as part of a self-contained food database package.

The internal ids on the other hand are a simple numerical id that is efficient but is only valid within a single Intake24 instance, so in essence the internal id is just a surrogate id for the locale_id/food_code pair, but they are generally interchangeable.

Agreed, id is PK sequence and locale_id/food_code pair is unique constrain.

Since after these changes we can now always scope the food/category codes to a locale, I think we should relax the code uniqueness constraint to only require uniqueness within a locale (eventually the composable "food list", whatever we decide to call it), and we should also make the character limit more generous (at least 36 characters to accomodate UUIDs). This way we can include the original food and category ids from external sources (such as the French Alabane food database) without having to jump hoops making them globally unique and compatible with the Intake24 code system, and also make copies of food lists when required for versioning purposes (NDNS years) or extensions (UK + Gusto, UK + gluten-free foods, UK + ethnic minority foods etc.) without having to change any codes.

  • yep, uniqueness is on locale_id/food_code pair
  • good point about the length, I have already increased it to 32 chars. But it makes sense to increase it more to accommodate common unique ID patterns like UUIDs, will update it to 36.

If we do this, associated food prompts and any other features that refers to foods or categories (like the sandwich builder) will need to be updated to use internal ids to keep the references simple (otherwise they'd have to be a locale/code pair). The prompt condition editor will also need to be updated to use a friendly UI for searching/picking categories, at the moment it only accepts a code string that has to be typed manually.

I wasn't sure about this, whether to introduce internal ids into the schemes/AFPs, where it is in conditions and various references for the food search. So far I haven't touched this and kept the codes and also in AFP codes too. My initial thinking was if introduce composable food lists, we might still need localeCode/foodCode|categoryCode pairs for resolution based on locale priorities and same food codes, so might make sense to keep them as codes? But I haven't thought the composable food lists concept through much yet.
UI for food/cat look-up into the prompt-mgr should be easy to implement already, it's already used in food explorer for categories and select-resource component provides flexibility to use any resource lookup.

The packager API relies heavily on the food/category code concept because it needs to identify foods/categories by their persistent codes to be able to update records without breaking any internal references. It will also need a workaround to import existing packages with separate local/global data as we might need to import some data from old packages at some point (like Danish and Portuguese databases).

So it looks like for some operations there will need to be two API paths, the default one using the internal id and the one using the category/food codes (.../locale/by-code/...).

Makes sense, I think we could do a bit of API clean-up as it's inconsistent and do something like this to have both id and code-pair endpoints:

# survey
/categories/:categoryId
/foods/:foodId
/locales/:localeCode/categories/:categoryCode
/locales/:localeCode/foods/:foodCode

# admin
/admin/categories/:categoryId
/admin/foods/:foodId
/admin/locales/:localeCode/categories/:categoryCode
/admin/locales/:localeCode/foods/:foodCode

Ranking/food index code should probably still use codes, because initialising, updating or exporting the ranking data will need to use the persistent codes anyway.
Agreed

Attributes -> tags conversion looks good, I sent a request to the DA team re the reasonable amount values. If they decide to keep it, we could move them to categories as a nullable field (and drop from foods for simplicity).

Agreed

Agree that food_local_lists no longer makes sense. If we decide to decouple locales from food lists like we discussed, the hidden food flags will have to be either at the study or study schema level. I think the schema level makes more sense because some prompts are set up with a specific food list/category structure in mind.

Agreed

I'm not sure what to do with food groups, but I think they are used a bit for reference, just not with localized food names (food_group_locals only has en-GB data). So I guess I could do similar as with the rest and create missing locale records for each locale version and make them locale specific lists, same as brands (although those are not used yet in V4 as well!)

If you get a chance, could you please run / review the foods migration file so far? It should already include quite a lot what we discussed here and there is also code to back-fill locales with prototypes. Could I ask you to review this more closely to be sure I haven't messed up the prototype resolution :-)

@lukashroch lukashroch force-pushed the fdb-changes branch 3 times, most recently from afd84f4 to 1285ba3 Compare January 16, 2025 19:48
@lukashroch lukashroch force-pushed the fdb-changes branch 3 times, most recently from b659998 to 8beb761 Compare January 23, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants