Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible filter and components query needlessly fires twice - and takes a very long time #1658

Closed
1 task
Rdataflow opened this issue Jul 9, 2024 · 5 comments · Fixed by #1890
Closed
1 task
Assignees
Labels
bug Something isn't working

Comments

@Rdataflow
Copy link
Contributor

Rdataflow commented Jul 9, 2024

To Do

  • exclude non-key dimensions from the most expensive query fired when initializing a chart from cube

@bprusinowski very similar results are fetched twice - most of which aren't even needed (non-key) - thanks for taking a look

Describe the bug
The rework of #1487 shows a tiny glitch endin up in a big difference on milk cubes.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://int.visualize.admin.ch/browse?dataset=https%3A%2F%2Fagriculture.ld.admin.ch%2Ffoag%2Fcube%2FMilkDairyProducts%2FConsumption_Price_Month&dataSource=Int-uncached
  2. Enable Debug mode
  3. Create visualization
  4. See queryplan and GraphQL waterfall

Expected behavior

  • possible filters are fetched once
  • overall cubeComponents are fetched once using a speedy query

Actual behavior

  • possible filters are fetched twice
  • overall cubeComponents are fetched twice, the second query is very slow (and asks for information about every single non-keyDimension which becomes expensive and takes > 20s)
  • OTOH the chart already appears perfectly a few seconds from the start
  • so it's just the left panel (and not the filters) which fires that expensive re-query

Screenshots or video
image

Environment (please complete the following information):

  • Visualize environment and version: INT 4.7.2

Additional context
the query needlessly re-asking for information over all dimensions
get.components.expensive.txt

curl 'https://int.visualize.admin.ch/api/graphql' -X POST -H 'content-type: application/json' --data-raw '{"operationName":"DataCubeComponents","variables":{"locale":"de","sourceType":"sparql","sourceUrl":"https://lindas.admin.ch/query","cubeFilter":{"iri":"https://agriculture.ld.admin.ch/foag/cube/MilkDairyProducts/Consumption_Price_Month","filters":{"https://agriculture.ld.admin.ch/foag/dimension/product":{"type":"single","value":"https://agriculture.ld.admin.ch/foag/product/193"},"https://agriculture.ld.admin.ch/foag/dimension/value-chain-detail":{"type":"single","value":"https://agriculture.ld.admin.ch/foag/value-chain-detail/18"},"https://agriculture.ld.admin.ch/foag/dimension/key-indicator-type":{"type":"single","value":"https://agriculture.ld.admin.ch/foag/key-indicator-type/1"},"https://agriculture.ld.admin.ch/foag/dimension/production-system":{"type":"single","value":"https://agriculture.ld.admin.ch/foag/production-system/3"}},"loadValues":true}},"query":"query DataCubeComponents($sourceType: String!, $sourceUrl: String!, $locale: String!, $cubeFilter: DataCubeComponentFilter!) {\n  dataCubeComponents(\n    sourceType: $sourceType\n    sourceUrl: $sourceUrl\n    locale: $locale\n    cubeFilter: $cubeFilter\n  )\n}\n"}'
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/cost-component> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/currency> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/data-method> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/data-source> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/date> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/foreign-trade> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/key-indicator-type> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/market> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/product> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/product-group> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/production-system> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/product-origin> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/product-properties> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/product-subgroup> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/sales-region> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/unit> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/usage> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/value-chain> } 
VALUES ?dimensionIri { <https://agriculture.ld.admin.ch/foag/dimension/value-chain-detail> }
@Rdataflow Rdataflow added the bug Something isn't working label Jul 9, 2024
@Rdataflow Rdataflow changed the title components query needlessly fires twice - and takes a very long time possible filter and components query needlessly fires twice - and takes a very long time Jul 9, 2024
@bprusinowski bprusinowski self-assigned this Aug 28, 2024
@bprusinowski
Copy link
Collaborator

Hey @Rdataflow, thanks for identifying and reporting the issue. I am not 100% sure it's a regression from before the refactor of the query, as there are some specific reasons we need to have the flow mentioned as problematic above:

  • first query fetches all dimensions with values, including non-key dimensions,
  • we need this information to be able to create initial chart config and a list of possible chart types,
  • we need to access dimension values to derive the initial filters and check if we need to fire possibleFiltersQuery in case default filters result in no observations,
  • after the chart was initialized, we no longer need to send an unfiltered components query and send ones with filters, specifically for the left panel.

To sum up, it looks like it's the left panel that has some duplicated logic, but in fact first queries are set when initializing chart from cube and are not related to the queries we send from the left filter panel. I wouldn't treat this as a bug, as we introduced this behavior in order to make sure we show a "correct" chart as soon as it loads, and prevent showing no-data screen initially, followed by a reload of queries only afterwards.

I modified the logic a bit to re-use the preview query in #1697. As a con, it always fires possible filters query, contrary to conditional firing in the old logic. Let me know if that explains and improves situation :)

@bprusinowski
Copy link
Collaborator

bprusinowski commented Sep 4, 2024

After discussing with @Rdataflow, we'll not merge #1697, but rather exclude non-key dimensions from the most expensive query fired when initializing a chart from cube. The problem with #1697 is that we can no longer easily determine "preferred dimension values", which means that e.g. when opening an NFI: Change cube, we do not select Schweiz anymore (as the top-root hierarchy value), but rather Fribourg.

@sosiology
Copy link
Contributor

@bprusinowski can this issue be closed (as the PR will not be merged), or should we keep it open?

@bprusinowski
Copy link
Collaborator

Hi @sosiology, I think there's still one thing we should improve connected to this issue (excluding non-key dimension from the most expensive query). I'd keep it open for now 👍

@sosiology
Copy link
Contributor

got it! Thanks @bprusinowski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants