Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Duplicated hierarchies #1260

Merged
merged 16 commits into from
Nov 13, 2023
Merged

fix: Duplicated hierarchies #1260

merged 16 commits into from
Nov 13, 2023

Conversation

bprusinowski
Copy link
Collaborator

@bprusinowski bprusinowski commented Nov 9, 2023

Fixes #1257.
Closes #1263.

This PR:

  • fixes hierarchies querying by only fetching the hierarchies based on cube's shape (and retrieving correct hierarchy names),
  • removes fetching of legacy hierarchies (hasHierarchy), as there is only one such hierarchy on LINDAS,
  • improves data fetching performance by not fetching shape for raw cubes (as a newer one might be needed anyway).

Context

It looks like there's too many hierarchies fetched in the example chart from #1257 (https://visualize.admin.ch/de/v/IYdcAI0tH0Io?dataSource=Prod).

There are two issues visible in this chart:

  • (apparent) duplicated hierarchies,
  • hierarchy that (most probably) shouldn't be fetched at all.
double.selection.mov
Screenshot 2023-11-09 at 22 36 08

(Apparent) duplicated hierarchies

First issue occurs due to retrieving name from hierarchyPointer.out(ns.cubeMeta.nextInHierarchy) instead of hierarchyPointer directly. @ptbrowne do you remember the context here (it looks like this has been done to avoid duplication)? After looking at docs and investigating the data on LINDAS, I think we should base directly on hierarchyPointer – if we do it, the name of hierarchies are unique (Schweiz - Produktionsregion - Wirtschaftsregion and Schweiz - Produktionsregion).

Hierarchy that (most probably) shouldn't be fetched at all

Second one is more weird – generally, when we fetch a cube, two queries are fired, one for cube and the other one for cube shape. It looks like this behavior results in more hierarchies that we're actually interested in, as (I think, also based on kronmar's query), we should only fetch hierarchies that are specified in the cube's shape.

I didn't investigate enough to say for sure why these additional hierarchies are being fetched along with cube, but if we just fetch cube shape, it retrieves correct hierarchies.

It's a bit unfortunate, as basically we need to fetch shape twice, once initially, and afterwards again when fetching hierarchies, as it looks we can't distinguish if a given triple came from cube or shape. Some optimisation might be possible, but I think it would need to be somehow connected with changes to cube-view-query library, to be able to tell if a given triple came from one or the other query.

I introduced an ExtendedCube class to allow us to store shape triples inside shapePtr, thus removing a need of fetching the shape twice. It might be useful to introduce such improvement directly to cube-view-query library, or to better understand why there is hierarchy mismatch when fetching cube and shape.

cc @Rdataflow

...we should retrieve the name of actual hierarchy, not nextInHierarchy
...as there are only ~3 legacy hierarchies on LINDAS PROD, it makes sense to remove them.
...as otherwise we also include hierachies retrieved through DESCRIBEing the cube, which are not part of the shape.
Copy link

vercel bot commented Nov 9, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
visualization-tool ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 13, 2023 1:09pm

Copy link
Collaborator

@lloydrichards lloydrichards left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't see anything out of the ordinary, but also don't have much experience with the cube queries.

app/rdf/query-hierarchies.ts Show resolved Hide resolved
@Rdataflow
Copy link
Contributor

@bprusinowski LGTM 👍

Copy link
Collaborator

@ptbrowne ptbrowne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM, I think it's good to have a splitted issue with the future of ExtendedCube. Thanks !

const cube = await loaders.cube.load(iri);
const rawCube = await loaders.cube.load(iri);
// Currently we always default to the latest cube.
const cube = await getLatestCube(rawCube);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the splitted issue, getLatestCube could be put on ExtendedCube.

import rdf from "rdf-ext";
import DatasetExt from "rdf-ext/lib/Dataset";

export class ExtendedCube extends Cube {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cna you add a docstring describing the general goal ? So that when we see a method that should be there, we can be sure that this class is the right one ?

}
return [];

return uniqBy(hierarchies, (hierarchy) => getName(hierarchy, locale));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the splitted issue : getDimensionHierarchies could be on ExtendedCube. This way we will hide the complexity of the shapePtr there.

@bprusinowski bprusinowski merged commit 92e7da9 into main Nov 13, 2023
3 of 4 checks passed
@bprusinowski bprusinowski deleted the fix/hierarchies-querying branch November 13, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not fetch shape when getting raw cube embedded charts broken
4 participants