Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to disable dimension sorting in View.fromCube #95

Closed
bprusinowski opened this issue May 11, 2023 · 8 comments · Fixed by #96 or #99
Closed

Add an option to disable dimension sorting in View.fromCube #95

bprusinowski opened this issue May 11, 2023 · 8 comments · Fixed by #96 or #99
Assignees

Comments

@bprusinowski
Copy link
Contributor

Hey!

Recently I've been working on performance improvements in Visualize.admin and I've been looking at the performance of the data fetching. I've noticed that in some cases it takes much longer to create the View (View.fromCube) than to fetch the data itself – see the Comments section of this PR.

It turned out that it's "slow" because the dimensions are sorted by name when the View is created. I wanted to ask if it's necessary to sort them or if it's possible to add an option to enable / disable sorting?

I've made some tests and it seems that there was no impact on the generated observations query when adding such parameter (additionally, in Visualize we sort the observations and dimensions anyway in the front-end part). I would be happy to open a PR if this change makes sense :)

@Rdataflow
Copy link

@dabo505 can you put this issue on the fast track? we depend on this performance improvement for the next release of visualize...

@tpluscode
Copy link
Contributor

Hi @bprusinowski. Could you share example queries which are actually impacted? With and without the dimensions being sorted. I'm surprised that it should have any effect on the performance of querying the view

@bprusinowski
Copy link
Contributor Author

Hi @tpluscode, the performance of the query is not affected, but we need to create the View before we fetch the observations – and by default, the dimensions are sorted, which takes around 0.5s in the query used for testing. So this change would not improve the performance of the query, but performance of creating a View :)

@tpluscode
Copy link
Contributor

By all means, do not hesitate to send a PR. Maybe with some additional examples because I do not yet fully understand the impact and where the actual time penalty occurs

@bprusinowski
Copy link
Contributor Author

Sure, this is the benchmark I've made to pin-point the issue.

With sorting enabled (see that --sort took 549 ms and in total it took 556 ms). Generally it fluctuates around 500-600 ms in the example used for testing (this chart https://int.visualize.admin.ch/en/v/hKkUyBpGpWPl?dataSource=Int).

with sorting

With sorting disabled (see that in total it took 8 ms).

without sorting

Could you grant me the appropriate rights to open a PR in the repository? Thanks!

@tpluscode
Copy link
Contributor

Could you grant me the appropriate rights to open a PR in the repository? Thanks!

Actually, I don't have the necessary permission either. Please fork and PR from there

@bprusinowski
Copy link
Contributor Author

I've opened a PR (#96) 👍

@tpluscode
Copy link
Contributor

Released in v1.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants