Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] #1413

Merged

Conversation

yulia-bel
Copy link
Contributor

Csv feature upload performance improve (setting CTE as NOT MATERIALIZAED)

Overview

Adding logs and small performance improvement for bulk insert query - according to tests and logs analysis the fastest way for now is to set CTE used for features_data inserts as NOT MATERIALIZED and using chunks of 1000 values.

Some postgres analyse results:
Screenshot from 2023-08-01 16-19-45

Possible further solutions - inserting data with async jobs, refactoring geometries handling for features data uploaded via csv.

Designs

Link to the related design prototypes (if applicable)

Testing instructions

Please explain how to test the PR: ID of a dataset, steps to reach the feature, etc.

Feature relevant tickets

Link to the related task manager tickets


Checklist before submitting

  • Meaningful commits and code rebased on develop.
  • If this PR adds feature that should be tested for regressions when
    deploying to staging/production, please add brief testing instructions
    to the deploy checklist (docs/deployment-checklist.md)
  • Update CHANGELOG file

@yulia-bel yulia-bel requested a review from hotzevzl August 1, 2023 15:37
@vercel
Copy link

vercel bot commented Aug 1, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
marxan ✅ Ready (Inspect) Visit Preview Aug 1, 2023 3:37pm

Copy link
Member

@hotzevzl hotzevzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while we think about a more resilient (as in, given the many moving parts) solution, this does what it says on the tin.

not too happy about the way this does not scale (as discussed, it's likely because we'd need a different query setup altogether), but again, very much good enough for now.

@yulia-bel yulia-bel changed the title (api) Csv feature upload performance improve (setting CTE as NOT MATERIALIZAED) [api] improve insert performance when persisting features data from CSV uploads [MRXN23-256] Aug 1, 2023
@yulia-bel yulia-bel merged commit 0e93c69 into develop Aug 1, 2023
52 checks passed
@yulia-bel yulia-bel deleted the fix/api/MRXN23-256-csv-feature-upload-performance branch August 1, 2023 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants