[api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] #1413

yulia-bel · 2023-08-01T15:37:06Z

Csv feature upload performance improve (setting CTE as NOT MATERIALIZAED)

Overview

Adding logs and small performance improvement for bulk insert query - according to tests and logs analysis the fastest way for now is to set CTE used for features_data inserts as NOT MATERIALIZED and using chunks of 1000 values.

Some postgres analyse results:

Possible further solutions - inserting data with async jobs, refactoring geometries handling for features data uploaded via csv.

Designs

Link to the related design prototypes (if applicable)

Testing instructions

Please explain how to test the PR: ID of a dataset, steps to reach the feature, etc.

Feature relevant tickets

Link to the related task manager tickets

Checklist before submitting

Meaningful commits and code rebased on develop.
If this PR adds feature that should be tested for regressions when
deploying to staging/production, please add brief testing instructions
to the deploy checklist (docs/deployment-checklist.md)
Update CHANGELOG file

vercel · 2023-08-01T15:37:14Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
marxan	✅ Ready (Inspect)	Visit Preview	Aug 1, 2023 3:37pm

hotzevzl

while we think about a more resilient (as in, given the many moving parts) solution, this does what it says on the tin.

not too happy about the way this does not scale (as discussed, it's likely because we'd need a different query setup altogether), but again, very much good enough for now.

yulia-bel added 2 commits July 31, 2023 20:00

Add logs and cte optimization to csv features upload

3423576

Update logs for csv features upload

bc6ac63

yulia-bel requested a review from hotzevzl August 1, 2023 15:37

hotzevzl approved these changes Aug 1, 2023

View reviewed changes

yulia-bel changed the title ~~(api) Csv feature upload performance improve (setting CTE as NOT MATERIALIZAED)~~ [api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] Aug 1, 2023

yulia-bel merged commit 0e93c69 into develop Aug 1, 2023
52 checks passed

yulia-bel deleted the fix/api/MRXN23-256-csv-feature-upload-performance branch August 1, 2023 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] #1413

[api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] #1413

yulia-bel commented Aug 1, 2023

vercel bot commented Aug 1, 2023

hotzevzl left a comment

[api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] #1413

[api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] #1413

Conversation

yulia-bel commented Aug 1, 2023

Csv feature upload performance improve (setting CTE as NOT MATERIALIZAED)

Overview

Designs

Testing instructions

Feature relevant tickets

Checklist before submitting

vercel bot commented Aug 1, 2023

hotzevzl left a comment

Choose a reason for hiding this comment