heavy performance regression in preview #1298

Rdataflow · 2023-12-11T12:42:27Z

Describe the bug
following the merge of #1285 there are some heavy performance regressions observed
comparing the response time for this cube, there is a serious regression

To Reproduce
Steps to reproduce the behavior:

Go to https://test.visualize.admin.ch/en/browse?previous=%7B%22order%22%3A%22SCORE%22%2C%22search%22%3A%22imp%22%2C%22includeDrafts%22%3Atrue%7D&dataset=https%3A%2F%2Fenvironment.ld.admin.ch%2Ffoen%2Ffab_ahst_exp_imp%2F1&dataSource=Int&flag__debug=true
observe it takes ~60s to load the preview
open the debug panel and preview GQL query to see the slow SPARQL query

Expected behavior
equally fast preview like on INT, i.e. near ~1s

Environment (please complete the following information):

Visualize environment and version: TEST - after merge of ~1285

Additional context

slow SPARQL query: https://s.zazuko.com/yx4AHz

Rdataflow · 2023-12-12T07:25:55Z

FTR: fixed by db optimize on INT by VSHN

Rdataflow · 2024-01-11T09:00:49Z

FTR: same story again on TEST v9.2.1 now with a new dataset.

db optimize last night didn't help for this case though. 😟

bprusinowski · 2024-01-17T11:31:14Z

Hi @Rdataflow, I just opened two links you shared (Visualize TEST and LINDAS TEST) and observed the following loading times:

Visualize: 21s,
SPARQL: 19s.

It looks like the performance is the same for both sources. Can you confirm this is also a case on your side?

Rdataflow · 2024-01-17T16:58:47Z

we need to reopen - issue revives with new cube

see https://test.visualize.admin.ch/browse?dataset=https%3A%2F%2Fenvironment.ld.admin.ch%2Ffoen%2Ffab_ahst_exp_imp_pipeline%2F1&dataSource=Test
missing #pragma join.hash off is the cause for this
this time db optimize didn't help
x-ref https://control.vshn.net/tickets/SBAR-1059

bprusinowski · 2024-01-18T12:58:05Z

Thanks for clarifying @Rdataflow. It looks like the reason for #1285 was to fix some broken cube previews. When checking the broken cube preview from #1285, it's still the case that we have a cartesian product generated when having #pragma join.hash off applied.

Technically I could just hard-code to always take the first ten observations, even though the SPARQL query should already return only that. Is there no hope for a proper fix on the database level to avoid workarounds on our side (by fixing the performance we introduce another bug that's not really on our side)?

Rdataflow · 2024-01-18T13:20:58Z

@bprusinowski can you share some example queries where you get more results than expected?

bprusinowski · 2024-01-18T13:28:10Z

Sure, this is a query that made us introduce #1285 in the first place: https://s.zazuko.com/6AcHrD

I am only aware of this one example at the moment, but can't guarantee there aren't more cases like this 👀

Rdataflow · 2024-01-18T15:05:31Z

I asked VSHN to clarify how to fix the observed 1000 results...

bprusinowski · 2024-01-18T15:15:55Z

Great, thank you! 💯

Rdataflow · 2024-01-30T16:50:51Z

@bprusinowski

Plan A: that's being reworked anyway per https://gitlab.ldbar.ch/bafu/visualize/-/issues/579
Plan B: if not - as per https://control.vshn.net/tickets/SBAR-1059 we need to reintroduce #pragma join.hash off again for previews...

cc @ClaudioDiGallo

bprusinowski · 2024-03-05T09:06:55Z

Closing as we have rewritten the preview query completely.

Rdataflow added the bug Something isn't working label Dec 11, 2023

Rdataflow closed this as completed Dec 12, 2023

sosiology reopened this Jan 11, 2024

bprusinowski self-assigned this Jan 18, 2024

bprusinowski closed this as completed Mar 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heavy performance regression in preview #1298

heavy performance regression in preview #1298

Rdataflow commented Dec 11, 2023

Rdataflow commented Dec 12, 2023

Rdataflow commented Jan 11, 2024

bprusinowski commented Jan 17, 2024

Rdataflow commented Jan 17, 2024

bprusinowski commented Jan 18, 2024

Rdataflow commented Jan 18, 2024

bprusinowski commented Jan 18, 2024

Rdataflow commented Jan 18, 2024

bprusinowski commented Jan 18, 2024

Rdataflow commented Jan 30, 2024

bprusinowski commented Mar 5, 2024

heavy performance regression in preview #1298

heavy performance regression in preview #1298

Comments

Rdataflow commented Dec 11, 2023

Rdataflow commented Dec 12, 2023

Rdataflow commented Jan 11, 2024

bprusinowski commented Jan 17, 2024

Rdataflow commented Jan 17, 2024

bprusinowski commented Jan 18, 2024

Rdataflow commented Jan 18, 2024

bprusinowski commented Jan 18, 2024

Rdataflow commented Jan 18, 2024

bprusinowski commented Jan 18, 2024

Rdataflow commented Jan 30, 2024

bprusinowski commented Mar 5, 2024