Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

heavy performance regression in preview #1298

Closed
Rdataflow opened this issue Dec 11, 2023 · 11 comments
Closed

heavy performance regression in preview #1298

Rdataflow opened this issue Dec 11, 2023 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@Rdataflow
Copy link
Contributor

Describe the bug
following the merge of #1285 there are some heavy performance regressions observed
comparing the response time for this cube, there is a serious regression

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://test.visualize.admin.ch/en/browse?previous=%7B%22order%22%3A%22SCORE%22%2C%22search%22%3A%22imp%22%2C%22includeDrafts%22%3Atrue%7D&dataset=https%3A%2F%2Fenvironment.ld.admin.ch%2Ffoen%2Ffab_ahst_exp_imp%2F1&dataSource=Int&flag__debug=true
  2. observe it takes ~60s to load the preview
  3. open the debug panel and preview GQL query to see the slow SPARQL query

Expected behavior
equally fast preview like on INT, i.e. near ~1s

Environment (please complete the following information):

  • Visualize environment and version: TEST - after merge of ~1285

Additional context

@Rdataflow Rdataflow added the bug Something isn't working label Dec 11, 2023
@Rdataflow
Copy link
Contributor Author

FTR: fixed by db optimize on INT by VSHN

@Rdataflow
Copy link
Contributor Author

FTR: same story again on TEST v9.2.1 now with a new dataset.

db optimize last night didn't help for this case though. 😟

@sosiology sosiology reopened this Jan 11, 2024
@bprusinowski
Copy link
Collaborator

Hi @Rdataflow, I just opened two links you shared (Visualize TEST and LINDAS TEST) and observed the following loading times:

  • Visualize: 21s,
  • SPARQL: 19s.

It looks like the performance is the same for both sources. Can you confirm this is also a case on your side?

@Rdataflow
Copy link
Contributor Author

we need to reopen - issue revives with new cube

@bprusinowski
Copy link
Collaborator

Thanks for clarifying @Rdataflow. It looks like the reason for #1285 was to fix some broken cube previews. When checking the broken cube preview from #1285, it's still the case that we have a cartesian product generated when having #pragma join.hash off applied.

Technically I could just hard-code to always take the first ten observations, even though the SPARQL query should already return only that. Is there no hope for a proper fix on the database level to avoid workarounds on our side (by fixing the performance we introduce another bug that's not really on our side)?

@bprusinowski bprusinowski self-assigned this Jan 18, 2024
@Rdataflow
Copy link
Contributor Author

@bprusinowski can you share some example queries where you get more results than expected?

@bprusinowski
Copy link
Collaborator

Sure, this is a query that made us introduce #1285 in the first place: https://s.zazuko.com/6AcHrD

I am only aware of this one example at the moment, but can't guarantee there aren't more cases like this 👀

@Rdataflow
Copy link
Contributor Author

I asked VSHN to clarify how to fix the observed 1000 results...

@bprusinowski
Copy link
Collaborator

Great, thank you! 💯

@Rdataflow
Copy link
Contributor Author

@bprusinowski

cc @ClaudioDiGallo

@bprusinowski
Copy link
Collaborator

Closing as we have rewritten the preview query completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants