Streamlined site assessment process #50

ConnectedSystems · 2024-11-07T01:15:40Z

Supersedes changes in the perf-changes branch.

Changes to assessment process for computational efficiency.

Settings:

Region: Cairns-Cooktown
type: Slope
Depth: -4.0:-2.0
Slope: 0.0:40.0
Rugosity: 0.0:6.0
SuitabilityThreshold: 95
xdist: 450
ydist: 50

Example endpoint: http://127.0.0.1:8000/suitability/site-suitability/Cairns-Cooktown/slopes?Depth=-4.0:-2.0&Slope=0.0:40.0&Rugosity=0.0:6.0&SuitabilityThreshold=95&xdist=450&ydist=50

Originally took 8-12 seconds for the site suitability assessment step alone (1406 potential locations).
Now down to ~6.5 seconds.

A different strategy for very large numbers of potential locations still needed.

ConnectedSystems · 2024-11-23T07:14:13Z

Applied KD-tree clustering to filter out pixels which are close to each other and not worthwhile re-assessing.

I have also adjusted requirements to be a bit tighter: previously, assessment of pixels by rotating a polygon around it was abandoned if a score was < 33%, but have raised that to 70%.

We were also rotating the polygon twice in 15 degree steps (in either direction). We are now only doing this once.

Doing this drastically reduces by orders of magnitude:

The number of pixels to assess
The number of total rotations

I have also re-enabled multi-threading so each potential pixel is assessed in parallel.
However, this increases memory use to close to 9GB. There are some things that could be done to reduce overall memory use, but will have to be investigated later.

The effect of these changes is that site suitability assessment takes ~70 seconds (compared to ~12mins previously) for Cairns-Cooktown.

The worst case seems to be the Far Northern region, where assessment took 3mins. From previously discussions with @arlowhite , this region previously could take upwards of 45-50mins.

Note that the criteria ranges used for demonstration (shown below) purposely did not specify ranges for all criteria so somewhat represents a "worst case".
On the other hand, the timings were taken on the laptop with 12 threads - could naively assume there's a 4x increase when run in the cloud...

http://127.0.0.1:8000/suitability/site-suitability/FarNorthern/slopes?Depth=-10.0:-2.0&Slope=0.0:40.0&Rugosity=0.0:6.0&SuitabilityThreshold=95&xdist=450&ydist=50

Previously the buffering distance wasn't applied to max_count. Additionally have moved to max_offset to allow for full rotation of the search box without going outside of the rel_pix bounds. And adjusted options if there are no rel_pix for a search box (returns standard geometry to output valid DataFrame, but will be filtered out at later steps).

Add max_count documentation and capping at 1

Argument names updated for readability Reduced number of arguments by moving dependent work inside function Moved repeated calculations outside for loop Use preallocated array (`loc_constraint`) to avoid reallocation of potentially 10s of thousands of values Assign results directly to result arrays Update raster-based method as well

This approach is faster and uses ~5x less memory

Fix typos in docstring

Standardize column names to `lons` and `lats` (although I actually prefer the singular over the plural, but this is how it is in the lookup tables)

Idea is to remove unnecessary buffer areas around valid pixels. However this needs to be done in the pre-processing step, not on data load. But I am much too tired to do it right now, so committing this so the idea isn't lost.

Previously a "search geometry" would be created and moved/rotated about each pixel location to be assessed. i.e., [n pixels]^2 * [r rotations] operations We now instead pre-generate the rotated polygons and move these to each pixel location to be assessed. i.e., [r rotations] + [n pixels] operations

Apply KD-tree clustering to identify "groups" of suitable pixels and only assess the "center" pixel of the group. Further constrain search by adopting a higher requirement (polygons must have suitable pixels in >= 70% of its area compared to 33% previously). This could be further constrained by defaulting from 95% suitability scores (proportion of area around target pixel meeting suitability criteria) to 99%.

ConnectedSystems requested a review from arlowhite November 7, 2024 01:15

BG-AIMS and others added 28 commits November 23, 2024 18:14

Fix incorrect use of meters in max_offset

7bf05a8

fix incorrect removal of . and Float64

e535981

Change to more efficient estimate of maximum pixel coverage

8a75fb8

Add max_count documentation and capping at 1

update mistaken resolution parameter

1bd14e6

Address comment around docstring formatting

4e1d793

fix to filter based on region short names

42f6dc3

Fix function signature in docstrings

c74f64c

Code format

924d4b8

Remove threading (does not help in this instance)

9b06ce1

Increase suitability threshold and change name for readability

e3e3323

Remove threading (doesn't help)

639e99c

Format for readability

e72850f

lookup tables already include lons and lats columns

504a0a6

Change extraction method for efficiency

a48dfff

This approach is faster and uses ~5x less memory

Fix typo in docstring

ac110d2

Fix typos in docstring

Specify return type

0d63007

Switch to a more efficient dim extraction approach

d0bcf51

Standardize column names to `lons` and `lats` (although I actually prefer the singular over the plural, but this is how it is in the lookup tables)

Update references to lons and lats in readme and docs

b93e816

Add a tiff writing method

46e6eed

Simplify closest reef outline identification

e0d7c39

Update use of modified function

dcc61bc

More efficient site filtering process

08edb7e

Switch to writing a tiff instead of COG

61383c4

Committing unfinished code

c55d572

Idea is to remove unnecessary buffer areas around valid pixels. However this needs to be done in the pre-processing step, not on data load. But I am much too tired to do it right now, so committing this so the idea isn't lost.

Updated project manifest

3a45fe8

Fix errors caused by merge/rebase and update var names for readability

ed96ef4

ConnectedSystems added 10 commits November 23, 2024 18:14

Indicate time

9b0f927

Fix var reference in docstring

3aafc4b

Specify use of Base method and loosen msg check

2681105

Expand docstrings and add additional geom rotation helper

9b45820

Assess sites with static (fixed-size) arrays and multi-threading

19bf133

Use kd-trees to spatially cluster sites to reduce analysis workload

2cc7a44

Update project spec

7a9f277

Auto-format for style check

c1d2707

ConnectedSystems force-pushed the streamlined-process branch from 510a171 to ddaf69f Compare November 23, 2024 07:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamlined site assessment process #50

Streamlined site assessment process #50

ConnectedSystems commented Nov 7, 2024 •

edited

Loading

ConnectedSystems commented Nov 23, 2024

Streamlined site assessment process #50

Are you sure you want to change the base?

Streamlined site assessment process #50

Conversation

ConnectedSystems commented Nov 7, 2024 • edited Loading

ConnectedSystems commented Nov 23, 2024

ConnectedSystems commented Nov 7, 2024 •

edited

Loading