-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Refacto run dao sql query #2685
[PERF] Refacto run dao sql query #2685
Conversation
✅ Deploy Preview for peppy-sprite-186812 canceled.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2685 +/- ##
=========================================
Coverage 84.05% 84.05%
Complexity 1379 1379
=========================================
Files 248 248
Lines 6297 6297
Branches 286 286
=========================================
Hits 5293 5293
Misses 851 851
Partials 153 153 ☔ View full report in Codecov by Sentry. |
)) AS input_versions | ||
FROM runs_input_mapping im | ||
INNER JOIN dataset_versions dv ON im.dataset_version_uuid = dv.uuid | ||
WHERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this where .. in (SELECT ...)
necessary to gain performance boost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion yes because with this condition we only JSON_AGG (which is the bottleneck here) on the uuid that matters and not on the whole table.
For a given namespace and job name and a db.t4g.medium (vCPU: 2, RAM: 4 GB) machine:
The old query takes: 1 minute 14 seconds
The new query takes: 1.919 seconds
The new query without filter: More than 10 minutes (probably more, i stop it at ten minutes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Could you then add some comment describing that it's done for this purpose.
It may be tempting for anyone in future to refactor this code and get rid of this. In such case, our tests will not notice the performance degradation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ! ;)
Signed-off-by: sophiely <[email protected]>
e440afe
to
ab4f42d
Compare
Signed-off-by: sophiely <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
Problem
Closes: 2686
The SQL query run for listing all runs is quite slow and often result in a time out.
Solution
The logic is quite the same as before, the heavy operation is all the 4 JSON_AGG so we just need to filter first and apply this operation only on the relevent rows
run_uuid IN (SELECT uuid FROM runs_view WHERE job_uuid IN (SELECT uuid FROM filtered_jobs))
.For a given namespace and job name and a db.t4g.medium (vCPU: 2, RAM: 4 GB) machine:
One-line summary:
Checklist
CHANGELOG.md
(Depending on the change, this may not be necessary)..sql
database schema migration according to Flyway's naming convention (if relevant)