-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix performance regression in all releases-views #1237
Conversation
I'm not sure which way to go now, probably a decision for the maintainers. Easiest approach is reducing the results for the stars-views. |
Can we make |
I'll check it |
It actually is already |
Nulls last is also used for the release time, Also some other indexes were missing |
for example one on I'll finish everything up, then you'll see |
@Nemo157 could be ready for another round of checks :) |
Very briefly skimming it looks good, I'm too 😪 to properly review the tests today though, I'll take a look tomorrow if jyn514 doesn't before then. |
"create indexes for crates, github_repos and releases", | ||
// upgrade | ||
" | ||
CREATE INDEX crates_latest_version_idx ON crates (latest_version_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest_version_id
is denormalized data, I think. We should be able to recalculate it from release_time
, although I guess if we ever fix #708 that would have to change.
No need to change it here, mostly just musing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked, latest_version_id
is set in db::add_package::add_package_into_database
.
So we should be fine.
If we ever change that for #708 (which seems to be not only a technical change, but a change in the meaning of the lists), we will either drop the field (which let's the tests here fail), or fill the field with the (then) correct version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In then, we previously used the field in the WHERE
part, so that doesn't change :)
CREATE INDEX releases_github_repo_idx ON releases (github_repo); | ||
CREATE INDEX github_repos_stars_idx ON github_repos(stars DESC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to measure how much disk space these take up? Anything less than ~500 MB is probably fine if it speeds up queries this much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full index size for the tables locally (all indexes)
cratesfyi=# \e
┌──────────────┬───────────────┬──────────────┐
│ name │ relation_size │ indexes_size │
├──────────────┼───────────────┼──────────────┤
│ crates │ 6488 kB │ 8184 kB │
│ releases │ 39 MB │ 19 MB │
│ github_repos │ 29 MB │ 19 MB │
└──────────────┴───────────────┴──────────────┘
if you want to check production (quickly hacked together, likely there is a nicer way, probably also by index)
select
name,
pg_size_pretty(pg_relation_size(name)) as relation_size,
pg_size_pretty(pg_indexes_size(name)) as indexes_size
FROM
(
select 'crates' as name UNION ALL
select 'releases' as name UNION ALL
select 'github_repos' as name
) as ii
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name | relation_size | indexes_size
--------------+---------------+--------------
crates | 2872 kB | 4336 kB
releases | 398 MB | 23 MB
github_repos | 4392 kB | 1584 kB
(3 rows)
Yeah that seems fine.
LGTM |
All pages that show data from
web::releases::get_releases
are insanely slow (>1s).This PR fixes this.
Local tests and explain looks good now. I could only test with faked data locally, but likely the behaviour will be the same on prod.
The behaviour is the same for the views by release-time, for the views by github-stars we don't show releases any more that are on non-github repos.
I also restructured the related tests a bit, I'm happy to revert if you don't like it. In my mind it fits better when tests only test a single thing with a visible input-data, execution, assertion structure.
old query plan, releases by github stars
new query plan, release by github stars
old query plan, release by release-time
new query plan, release by release-time