Skip to content

Commit

Permalink
Compose "full" example for components schema
Browse files Browse the repository at this point in the history
See src/examples/datalad-dataset-components/ContainerSE-DatasetWFiles.yaml

It works in principle, but I was not yet able to get the description of
an AnnexedFileSE's `distribution` to work (see comments inside).

This is strange, because this part is more or less taken verbatim from
the `datalad-dataset-version` schema, and
`src/examples/datalad-dataset-version/DataladDatasetVersionSE-full.yaml`
shows it to be working properly.

As elaborated on inside, the JSON schema code generator does not include
a class that is required for validation. However, it does reference it.
  • Loading branch information
mih committed Feb 18, 2024
1 parent 750fbf9 commit d8bb1f1
Show file tree
Hide file tree
Showing 6 changed files with 168 additions and 2 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
components:
# version-less dataset
- meta_id: datalad:0b76362c-aa27-11ee-be29-b3b123281259
meta_type: dlccs:DataladDatasetSE
uuid: 0b76362c-aa27-11ee-be29-b3b123281259
# dataset version
- meta_id: gitsha:558275f650574389dcbbf7cd8ab5046482473fc8
meta_type: dlccs:DataladDatasetVersionSE
is_version_of: datalad:0b76362c-aa27-11ee-be29-b3b123281259
has_annex_remote:
- annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf
has_part:
# could link an entire tree as one part, which would have parts
# of its own
# such a tree would also need to support `qualified_part`
- gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
- gitsha:56094a33cf330fef5b375aa813fc4dc07147729f
qualified_part:
- at_location: subdir/README.txt
relation: gitsha:56094a33cf330fef5b375aa813fc4dc07147729f
- at_location: subdir/data.bin
relation: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
# git blob
- meta_id: gitsha:56094a33cf330fef5b375aa813fc4dc07147729f
meta_type: dlccs:GitBlobSE
gitsha: 56094a33cf330fef5b375aa813fc4dc07147729f
# annexed file
- meta_id: gitsha:b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
meta_type: dlccs:AnnexedFileSE
gitsha: b94ef1797f7bfc1ac979be122e1b538bbb0d1d58
# TODO we cannot have the following yet (not even bytesize alone).
# For an unknown reason the json schema generator does not put the
# Distribution class into its output (but communicated no error
# either). validation then fails with
# jsonschema.exceptions._WrappedReferencingError: PointerToNowhere: '/$defs/Distribution'...
#distribution:
# byte_size: 3425
# qualified_access:
# - access_id: MD5E-s3425--32a617360d10e3dcbfdd0885e8d64ab8.txt
# relation: annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf
# annex remote
- meta_id: annex:7e0bf3e7-7d46-4093-813e-b4009826c3bf
meta_type: dlccs:AnnexRemoteSE
uuid: 7e0bf3e7-7d46-4093-813e-b4009826c3bf
14 changes: 14 additions & 0 deletions src/linkml/ontology/datalad.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,23 @@ classes:
is_a: Dataset
description: >-
A version (i.e., commit) of a Datalad dataset.
slots:
- has_annex_remote
slot_usage:
is_version_of:
range: DataladDataset
todos:
- This class has the `has_annex_remote` slot primarily for historical
reasons. It makes sense to have it, but it is a conceptual conflict.
An annex remote is not registered for a specific dataset version,
but for a whole repository (in the git-annex branch). This makes it
version-less (and could even span multiple datasets (hosted in
different branches). Moreover, there is no concept of different
special remote configurations per version. It makes sense to migrate
this information to a different place. A candidate would be
`DataladDataset`. However, strictly speaking this is not a requirement
and special remotes are valid outside the realm of a datalad dataset.
It may be needed to model something like a git-annex repository.

DataladDataset:
class_uri: dlco:DataladDataset
Expand Down
11 changes: 11 additions & 0 deletions src/linkml/ontology/git-annex.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ classes:
slot_usage:
uuid:
required: true
todos:
- Add support for remote (not key) specific parameters. `DataService` only
has `endpoint_url`

QualifiedAnnexAccess:
is_a: QualifiedAccess
Expand All @@ -47,6 +50,14 @@ classes:
slot_usage:
relation:
range: AnnexRemote
todos:
- >
We already support the case of an alternative identifier
(via `QualifiedAccess`). However, for `AnnexRemote` we might also need
to support expressing the state of a key at a remote
(see `SET/GETSTATE` at
https://git-annex.branchable.com/design/external_special_remote_protocol).
A state is more or less an additional/arbitrary set of parameters.
AnnexedFile:
mixin: true
Expand Down
2 changes: 2 additions & 0 deletions src/linkml/ontology/git.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ classes:
- GitTracked
description: >-
A `File` that is tracked with Git.
todos:
- Rename to `GitBlob`

#QualifiedGitTrackedPart:
# mixin: true
Expand Down
98 changes: 96 additions & 2 deletions src/linkml/schemas/datalad-dataset-components.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ classes:
description: >-
Representation for any resource tracked by Git, thereby having a unique
`gitsha`-based identifier.
comments:
- This is the base class of any entity that is directly tracked by Git.
slot_usage:
meta_id:
description: >-
Expand All @@ -87,19 +89,83 @@ classes:
equals_expression: "gitsha:{gitsha}"
pattern: "^gitsha:[0-9a-f]{40}$"

QualifiedGitTrackedPartSE:
class_uri: dlccs:QualifiedGitTrackedPartSE
mixins:
- QualifiedPart
description: >-
Schema element for a `QualifiedPart`. Every part is represented by
a `GitTrackedSE`.
slot_usage:
relation:
range: GitTrackedSE

AnnexDistributionSE:
class_uri: dlccs:AnnexDistributionSE
mixins:
- AnnexDistribution
description: >-
Schema element for a `AnnexDistribution`.
slot_usage:
qualified_access:
inlined: true
inlined_as_list: true
multivalued: true
range: QualifiedAnnexAccessSE

QualifiedAnnexAccessSE:
class_uri: dlccs:QualifiedAnnexAccessSE
mixins:
- QualifiedAnnexAccess
description: >-
Schema element for a `QualifiedAnnexAccess`.
slot_usage:
relation:
range: AnnexRemoteSE

AnnexRemoteSE:
class_uri: dlccs:AnnexRemoteSE
is_a: ComponentSE
mixins:
- AnnexRemote
description: >-
Schema element for a `AnnexRemote`.
slots:
- meta_id
slot_usage:
meta_id:
equals_expression: "annex:{uuid}"

DataladDatasetVersionSE:
class_uri: dlccs:DataladDatasetVersionSE
is_a: GitTrackedSE
description: >-
TODO
mixins:
- DataladDatasetVersion
description: >-
TODO
todos:
- Add the `Commit` interface via a mixin
slot_usage:
has_annex_remote:
multivalued: true
inlined: false
range: AnnexRemoteSE
todos:
- see TODO in DataladDatasetVersion re this slot
has_part:
inlined: false
multivalued: true
range: GitTrackedSE
is_version_of:
inlined: false
range: DataladDatasetSE
meta_id:
equals_expression: "gitsha:{gitsha}"
qualified_part:
inlined: true
inlined_as_list: true
multivalued: true
range: QualifiedGitTrackedPartSE

DataladDatasetSE:
class_uri: dlccs:DataladDatasetSE
Expand All @@ -115,3 +181,31 @@ classes:
slot_usage:
meta_id:
equals_expression: "datalad:{uuid}"

GitBlobSE:
class_uri: dlccs:GitBlobSE
is_a: GitTrackedSE
mixins:
- FileInGit
description: >-
Schema element for a `FileInGit`.
AnnexedFileSE:
class_uri: dlccs:AnnexedFileSE
is_a: GitTrackedSE
mixins:
- AnnexedFile
description: >-
Schema element for a `AnnexedFile`.
slot_usage:
distribution:
inlined: true
multivalued: false
range: AnnexDistributionSE
notes:
- This is not multivalued, because the distribution of an annexed
file is an annex key, a bit identical blob. The only thing that
we can have multiple of are remote locations, where this key is
available. Even when have a URL key and the actual content may
be unknown (yet), the system is not made to switch between
distributions without filename changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ target_class: ContainerSE
data_sources:
- src/examples/datalad-dataset-components/ContainerSE-DataladDataset-minimal.yaml
- src/examples/datalad-dataset-components/ContainerSE-DataladDatasetVersion-linkage.yaml
- src/examples/datalad-dataset-components/ContainerSE-DatasetWFiles.yaml
plugins:
JsonschemaValidationPlugin:
closed: true
Expand Down

0 comments on commit d8bb1f1

Please sign in to comment.