add documentation: revision of feature split functionality in Marxan …

…Cloud v2
Vizzuality · Feb 23, 2024 · 4ccf078 · 4ccf078
1 parent 9f774c8
commit 4ccf078
Showing 1 changed file with 111 additions and 0 deletions.
diff --git a/docs/README_revision-of-feature-split-functionality-in-marxan-cloud-v2.md b/docs/README_revision-of-feature-split-functionality-in-marxan-cloud-v2.md
@@ -0,0 +1,111 @@
+# Revising how feature splits work in Marxan Cloud v2 - February 2024
+
+This document outlines the changes needed in order to make the feature split
+functionality work in the v2 release of Marxan Cloud, taking into account the
+breaking change (introduced in the same v2 release) related to how feature
+amounts per planning unit are stored alongside other spatial data, rather than
+always computed on the fly when needed, via spatial intersection.
+
+## Linking of split features to subsets of features_data
+
+For each feature obtained via split, we need to store a list of unique
+identifiers of the `(geodb)features_data` rows that match the subset of feature
+identified by the K/V set for the split feature.
+
+Although `(geodb)features_data.id` would be a natural candidate for these ids,
+in practice this would complicate how we handle copying features over through
+clones of projects, as we would need to update all the stored lists of matching
+ids in each split feature we're copying over, to match the
+`(geodb)features_data.id` generated on insert of the cloned parent features from
+which splits were obtained.
+
+We already do some kind of magic around similar predicaments throughout feature
+exporters/importers (basically creating temporary ids on export, which we then
+reference on import, sort of), but it may be much more effective (as well as
+simpler) at this stage to assign a stable id to each `(geodb)features_data` row,
+and then reference this from within the list of ids stored with each split
+feature.
+
+As for the linking of split features to `features_data`, this could be done via
+a join table, but in this case it may be simpler to store this as an array of
+ids within a single column in the `(apidb)features` table itself, not least
+because the join-table alternative would not really provide any benefits from a
+referential integrity point of view, since `features_data` is in geodb.
+
+Exporting and importing features metadata would also be more complicated if
+using a join table.
+
+Whereas this plan is specifically for the split functionality, this linking of
+features to features_data via arrays of stable ids could be done for _all_ the
+features (plain ones, those obtained via splits, and those obtained via
+stratification while this operation was enabled on any given instance): this
+way, the same strategy can be used when querying features created in any of the
+three possible ways listed above.
+
+In the case of the current global platform-wide feature, this would mean storing
+a very large array of ids in the feature metadata, because of the large number
+of geometries/`features_data` rows for this specific feature. Likewise for
+user-uploaded features with large number of geometries. This may be a fair
+tradeoff, however, especially as it would apply to a very limited number of
+features, while allowing to avoid the need to query `features_data` in different
+ways depending on how a feature has been created.
+
+## Updating existing data
+
+DB migrations will be needed to set stable ids for existing `features_data` rows.
+
+A self-standing script (such as previous Python ones created to update existing
+data across apidb and geodb) will be needed to link existing split features to
+the correct subsets of `features_data`.
+
+### Creating stable ids for existing `features_data`
+
+This could be a simple
+
+```
+update features_data set stable_id = id;
+```
+
+as we won't need to enforce any particular values at this stage when backporting
+the new column on existing features. Making the column `not null default
+gen_random_uuid()` would also work.
+
+### Linking existing split features to `features_data`
+
+Once the above step is done, we can run a migration to query the relevant stable
+ids through the K/V pair which is stored as part of the feature, and then set
+the array of relevant stable ids.
+
+The script may use the same queries already used to calculate subsets within
+`SplitOperation` (and `StratificationOperation`, if wanting to do this for
+features that may be obtained via stratification as well).
+
+## Updating the split operation
+
+The main issue preventing the split operation from working correctly without the
+changes outlined in this document is that `SplitOperation` gets passed the id of
+the split feature for the step in which it tries to compute the amount per PU of
+the split feature, but since no `features_data` geometries are ever linked to
+any other feature than "parent" (i.e. whole, not split) features, then we end up
+with no geometries (nor amount per planning unit, and hence not even any min/max
+of amounts per PU).
+
+So this needs to be changed to:
+
+- firstly, query the subset of `features_data` that matches the K/V pair requested
+- store the list of stable ids of these `features_data` rows alongside the
+  feature being created
+- use the list of `features_data` rows derived above, when calculating feature
+  amounts per PU
+
+## Updating piece exporters and importers
+
+Piece exporters and importers will need to:
+
+- export and import the new `(geodb)features_data.stable_id` column
+- export and import the new `(apidb)feature_data_stable_ids` column
+
+## Updating queries for tiles of features
+
+These will need to use the list of relevant `features_data` rows as stored
+alongside each feature.