Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cube attribute: published/draft -> versioned cubes #34

Closed
jstcki opened this issue Sep 10, 2020 · 32 comments
Closed

Cube attribute: published/draft -> versioned cubes #34

jstcki opened this issue Sep 10, 2020 · 32 comments
Assignees

Comments

@jstcki
Copy link

jstcki commented Sep 10, 2020

Cube-level metadata attribute to describe if a cube is "published" or "draft"/"preview"

  • So users of visualize.admin.ch won't see work-in-progress data.
  • So the data owner will see work-in-progress data (e.g. on a different domain), so they can try it out.
@ktk
Copy link
Member

ktk commented Sep 17, 2020

  • Attached to the cube:Cube
  • See if there is a schema.org or DC terms property for that

@ktk
Copy link
Member

ktk commented Oct 13, 2020

@ktk
Copy link
Member

ktk commented Oct 20, 2020

schema:creativeWorkStatus it is.

@ktk
Copy link
Member

ktk commented Oct 20, 2020

See commit zazuko/rdf-cube-schema-viz@8b0e0a1 (mentioned the wrong issue, sorry about)

@danielmittag
Copy link

danielmittag commented Oct 26, 2020

FYI: Only published datasets will be written into the Lindas DB (as part of the publication process). So therefore this flag might be either be irrelevant for ITX or used in order to establish a preview mechanism.

https://gitlab.ldbar.ch/bafu/umweltdatenkiosk-planning/-/wikis/Technical-decisions#triplestore-architecture

@jstcki
Copy link
Author

jstcki commented Oct 26, 2020

@danielmittag Yes, preview is definitely a use-case, as I described in this issue. Even if we only end up with "published" cubes in the MVP, it's useful to specify this attribute.

@l00mi
Copy link
Contributor

l00mi commented Dec 14, 2020

Merging the concerns of zazuko/rdf-cube-schema-viz#2 with this.

@l00mi
Copy link
Contributor

l00mi commented Dec 14, 2020

The scope( if the dataset/cube shall be shown in visualize.admin.ch ) is dependent on zazuko/cube-creator#319.

@l00mi
Copy link
Contributor

l00mi commented Dec 14, 2020

Further do we have now Versions in cubes dependent on zazuko/cube-creator#318 they will be also explicitly marked with the version number.

Once we have examples for this we need a query for @herrstucki which lists the last version of the cubes to be shown in "visualize.admin.ch".

@l00mi l00mi changed the title Cube attribute: published/draft Cube attribute: published/draft -> versioned cubes Dec 14, 2020
@jstcki
Copy link
Author

jstcki commented Dec 14, 2020

@l00mi

  • Will the versioning cover distinguishing between draft/final state? If so, how? One might publish versions without wanting to change the "main" one.
  • Will each cube version have its own IRI?
    • How can a consumer stay up-to-date with future versions without having to update the IRI to the cube? Is it possible to reference the "main" cube instead of a specific version?
    • Alternatively, one may want to reference a specific version that never changes.

Maybe it should be similar to how swisstopo links from the "main" of a municipality entry to different versions via dc:hasVersion (e.g. https://ld.geo.admin.ch/boundaries/municipality/2553?format=html)

If this is all described somewhere already, please let me know :)

Re: example queries: will I need to do this with "raw" SPARQL or is this something that's possible with https://github.com/zazuko/rdf-cube-view-query (AFAIK it's currently not possible to filter cubes with it)?

@jstcki jstcki assigned l00mi and danielmittag and unassigned ktk Dec 16, 2020
@jstcki
Copy link
Author

jstcki commented Dec 16, 2020

After today's discussion the main initial use case is for the Visualization Tool is to hold a reference to a cube that will automatically resolve to the latest version.

E.g. something similar to npm's tagging system, where publishing a new version will update the latest tag to point to that version. https://pipelines-integ.lindas.admin.ch/cube/xyz:latest

Locking a chart to use a specific version also needs to be possible but is of lesser priority. E.g. https://pipelines-integ.lindas.admin.ch/cube/xyz:32

@l00mi
Copy link
Contributor

l00mi commented Dec 22, 2020

It is unfortunately not easily possible to completely exchange the ....:latest cube with a new one (including all the substructure).

We have discussed 3 variations which are possible to connect the different versions of the same cubes and navigate them:

  • Similar to the proposed solution, but to add only a "pointer" triple which always points to the latest cube.
    <https://environment.ld.admin.ch/dataset/cube/latest> <cube:latestCube> <https://environment.ld.admin.ch/dataset/cube/ver2>.
    
  • Have a triple point from the latest graph to the second latest graph.
    <https://environment.ld.admin.ch/dataset/cube/ver2> <cube:predecessorCube> <https://environment.ld.admin.ch/dataset/cube/ver1>.
    
  • Define a meta construct, where all cubes / datasets point to.
    <https://environment.ld.admin.ch/dataset/cube/ver1> <cube:parent> <https://environment.ld.admin.ch/dataset/cube>.
    <https://environment.ld.admin.ch/dataset/cube/ver2> <cube:parent> <https://environment.ld.admin.ch/dataset/cube>.
    

This different possibilities can be also combined. It depends also a bit what is the best for the https://github.com/zazuko/rdf-cube-view-query @bergos ? Any preference @herrstucki ?

Finally we had a short brainstorm on the problematic of changing structure inside the cube. Our first take on this is that we provide all metadata in regards of changes in the structure or also used variables, or min. / max. values. That said, it is best that the consuming application will decide on what changes it can migrate too in regard of which use case it implements. (In the case of visualize.admin.ch the changes which are still acceptable might even change by the kind of graph drawing was chosen.) More practical solution in the beginning might be defined with the customer.

@l00mi
Copy link
Contributor

l00mi commented Jan 19, 2021

@herrstucki @bergos ping?

@l00mi
Copy link
Contributor

l00mi commented Jan 23, 2021

Final first implementation:

Draft / Published is attached to the cube in their kind of Class schema:Dataset by:
<cube> schema:creativeWorkStatus <https://ld.admin.ch/definedTerm/creativeWorkStatus/draft> or respectively <cube> schema:creativeWorkStatus <https://ld.admin.ch/definedTerm/creativeWorkStatus/published>

A cube is in the scope of an application by adding schema:workExample:
For visualize.admin.ch e.g. <cube> schema:workExample <https://ld.admin.ch/application/visualize>

Versioned Cubes:

  • Cubes have a schema:validThrough if deprecated.
  • Cubes from the same line of life are connected as follows through a parent schema:CreativeWork with schema:hasPart.
<https://environment.ld.admin.ch/foen/ubd28> a schema:creativeWork;
     hasPart <https://environment.ld.admin.ch/foen/ubd28/1>;
     hasPart <https://environment.ld.admin.ch/foen/ubd28/2>;
.

@l00mi
Copy link
Contributor

l00mi commented Feb 3, 2021

@bergos for Versioned Cubes we need:

  • a way to get the "parent" identifier.
  • a way to get all connected cubes through a "parent" identifier (ideally with filter)

@bergos
Copy link

bergos commented Feb 3, 2021

I would propose to extend the DESCRIBE query for the cube to follow schema:hasPart, add .in() from clownface to the API objects (includes Cube) and add a filter to do the inverse of schema:hasPart. Then the API should be able to handle the following use cases:

Find the version history of a cube:

const versionHistory = cube.in(ns.schema.hasPart).term

Find all cubes attached to the version history:

const cubes = await source.cubes({
  filters: [
    Cube.filter.isPartOf(versionHistory)
  ]
})

Find latest cube for a specific version history (still returns an array, but must have length 1 if there is no error in the data):

const cubes = await source.cubes({
  filters: [
    Cube.filter.isPartOf(versionHistory),
    Cube.filter.noValidThrough()
  ]
})

@l00mi
Copy link
Contributor

l00mi commented Feb 3, 2021

@herrstucki would that solve the versioning part for you?

@bergos
Copy link

bergos commented Feb 5, 2021

I created this PR zazuko/rdf-cube-view-query#28 for the mentioned features.

@jstcki
Copy link
Author

jstcki commented Feb 8, 2021

The example looks good to me!

However, @bergos' example above I don't understand fully:

const versionHistory = cube.in(ns.schema.hasPart).term

Is cube here really a cube or a CreativeWork, as @l00mi explained above? The cube itself doesn't have a hasPart property, hence the need for filters.isPartOf. Correct?

@bergos
Copy link

bergos commented Feb 8, 2021

@herrstucki

isPartOf is the inverse property of hasPart. In the example below, both triples have the same meaning. It's allowed to have both, most of the time there is only one. If there is only one direction given in the triples, you must be able to follow different directions. That's done with .out() (subject -> predicate) or .in() (object -> predicate). Before you only used .out() to get stuff attached to the cube. Now you are using .in() to access the version history which points to the cube. The Cube.filter.isPartOf filter is actually using hasPart, just in the object -> predicate direction. As there is an official reverse property of hasPart, I used that name. Otherwise I may have named it hasPartInverse.

<versionHistory> <hasPart> <cube/1>.
<cube/1> <isPartOf> <versionHistory>. # virtual, does not exist in the store

@l00mi
Copy link
Contributor

l00mi commented Feb 8, 2021

@bergos https://github.com/zazuko/rdf-cube-schema-viz/issues/1#issuecomment-772507646 could almost 1:1 be used in the docs of rdf-cube-view-query lib.. do you want to paste it there?

@jstcki
Copy link
Author

jstcki commented Feb 8, 2021

Ahh, didn't know about the difference between .in and .out. Thanks for the explanation @bergos!

@l00mi
Copy link
Contributor

l00mi commented Feb 8, 2021

Good to close?

@jstcki
Copy link
Author

jstcki commented Feb 9, 2021

@l00mi if the concept is documented in the README, then yes 😁

@l00mi
Copy link
Contributor

l00mi commented Jul 7, 2021

also add the description about the version of the dimension visualize-admin/visualization-tool#75 (comment)

@l00mi
Copy link
Contributor

l00mi commented Jul 16, 2021

@herrstucki please be aware that https://ld.admin.ch/definedTerm/creativeWorkStatus/ will change to https://ld.admin.ch/vocabulary/creativeWorkStatus/.

Therefore

  • <cube> schema:creativeWorkStatus <https://ld.admin.ch/definedTerm/creativeWorkStatus/Draft>
  • <cube> schema:creativeWorkStatus <https://ld.admin.ch/definedTerm/creativeWorkStatus/Published>

will change to

  • <cube> schema:creativeWorkStatus <https://ld.admin.ch/vocabulary/creativeWorkStatus/Draft>
  • <cube> schema:creativeWorkStatus <https://ld.admin.ch/vocabulary/creativeWorkStatus/Published>

This change will start gradually in INT once zazuko/cube-creator#793 is created. So I propose do query for both meanwhile, before going to PROD we will assure that only the second and final version is active.

@l00mi
Copy link
Contributor

l00mi commented Sep 10, 2021

@herrstucki i added this section: https://github.com/zazuko/rdf-cube-schema-viz/blob/master/README.md#version-history-of-cubes.

Two things:

  • We will change from using the schema:validThrough to schema:expires which is the correct one in this case. The semantics stay the same. Once a cube has schema:expires attached it will dismissed.

  • I need more input on:

    It is expected that the cubes in the same history line do not change the count of dimensions. All the other descriptions can change.

    What else can not change so that the graph drawings in visualize.admin.ch can seamlessly update?

@jstcki
Copy link
Author

jstcki commented Nov 23, 2021

What else can not change so that the graph drawings in visualize.admin.ch can seamlessly update?

I find it hard to come up with an exhaustive list but broadly speaking:

  1. No structural/shape dimension metadata must change (i.e. dimensions should only be allowed to update description/name but not datatypes, scaleOfMeasure, dataKind etc.)
  2. No dimensions must be added/removed
  3. No observations must be removed, only added – not sure about this one … both could be problematic or not (worst case is that a chart shows no more data which might be intended/fine)

/cc @ptbrowne

@l00mi
Copy link
Contributor

l00mi commented Nov 23, 2021

@herrstucki Thank you. I think it is enough to understand the problem which originates of changing structurally for new versions. We will not be able to strictly check these anyway.

@semanticfire will integrate this with ongoing documentation effort.

@ptbrowne
Copy link

We had a small discussion about this with @herrstucki.

  • I think dimensions could be only added
  • I think it's fair game that observations could be removed

We will not be able to strictly check these anyway.

I am not sure what to make of that. If this is only done in the documentation, there will be charts which will error. I think it should be the cube creator responsibility to automatically expire the cube if any major change have been done to the cube.

This problem is very similar to what's being done with semantic versioning for libraries. It would be good to list all those cases and assign an importance to this. As a starter:

  1. Updating/removing a dimension = major version bump
  2. Adding observations -> minor version bump
  3. Changing a label of a dimension -> patch bump.

As right now we do not have this "semantic versioning", I think "major version bump" means here "deprecating the cube". The charts should automatically bump until the next deprecation flag on a cube version.

@l00mi
Copy link
Contributor

l00mi commented Nov 24, 2021

We absolutely can add such checks to not allow a publication of a cube which changes in a form which is not allowed. That is what I was asking for. If you can provide a list of checks to do, we can attack this.

That is speaking for the cube-creator. Other data providers might still create cubes which are not valid, so always try to have safe-guards in your implementation.

@l00mi l00mi transferred this issue from zazuko/rdf-cube-schema-viz Dec 1, 2021
@l00mi
Copy link
Contributor

l00mi commented Jan 26, 2022

The general functionality is described at https://zazuko.github.io/rdf-cube-schema/#version-history-of-cubes

The LINDAS / visualize.admin.ch specific information can be found here https://github.com/zazuko/cube-creator/wiki/LINDAS-Specifics#needed-attributes-that-a-cube-shows-up-on-visualizeadminch.

And finally for now we have a remark on the allowed changes for compatibility of the same version line here https://github.com/zazuko/cube-creator/wiki/4.-Publication#to-check-before-a-publication.

@l00mi l00mi closed this as completed Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants