Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify handling of labels in place of CURIEs #82

Open
cmungall opened this issue Aug 23, 2024 · 3 comments
Open

Unify handling of labels in place of CURIEs #82

cmungall opened this issue Aug 23, 2024 · 3 comments

Comments

@cmungall
Copy link
Contributor

The intent of KGCL was to allow for human readable labels wherever IDs are used. Like in the Protege manchester renderer, these would be enclosed in single quotes.

E.g add obsolete 'my bad term'

(original doc)

This was always intended as a surface syntax feature - just like the OWLAPI doesn't need to support use of quoted strings where it accepts URIs, neither should KGCL. The idea was mappers could handle this before and after serializing.

Note the idea was that this should be done with caution - for guaranteed interpretability the same snapshot of the ontology should be used for label rendering.

The current implementation is a bit inconsistent. In the data model, some slots have shadow slots such as about_node_representation or subject_type. In some cases the renderer will look at these and use these

if type(kgcl_instance) is NodeAnnotationChange:
subject = render_entity(
kgcl_instance.about_node, kgcl_instance.about_node_representation
)
predicate = render_entity(
kgcl_instance.annotation_property, kgcl_instance.annotation_property_type
)
old_object = render_entity(
kgcl_instance.old_value, kgcl_instance.old_value_type
)
new_object = render_entity(
kgcl_instance.new_value, kgcl_instance.new_value_type
)

In other cases it's hardwired to always use uri/curie

if type(kgcl_instance) is NodeDeletion:
subject = render_entity(kgcl_instance.about_node, "uri")
return "delete " + subject

I think these additional shadow slots pollute the model, we should remove these.

we can go back to the original idea of doing this at the time of DSL rendering. But there may also be use cases for preserving the "deferred dereferencing" in yaml/json serializations, and in the object model.

This would involve weakening the range constraint to string and allowing:

type: NodeObsoletion
about_node: "'my bad term'"

I think this is a bad decision from the point of view of KGCL behaving like a representation of diffs on the side of the ontology. But if KGCL is a language for representing things from the side of the user (more like the UI model in Protege) then this is defensible.

@gouttegd
Copy link

Personally, I believe allowing referencing terms by their label rather than by their ID is a bad idea. IDs exist for a reason. They are stable and non-ambiguous, contrary to labels.

But if we are to allow that, I agree that it should be a “surface” feature of the DSL.

So when parsing (going from the DSL representation to KGCL objects), resolving the labels should be done at parsing time, and once parsing is done, all that is left is the identifier of the resolved entity – the fact that the entity had been referenced by its label in the original instruction does not need to appear anywhere in the model.

Conversely, when rendering (going from KGCL objects to DSL syntax, e.g. when displaying a diff), it's the responsibility of the renderer to maybe turn the identifiers present in the model into labels, if this is desired.

Random thoughts:

  • Quoting the label should be mandatory, even if it is a single word. Allowing unquoted single-world labels would mean that we have to use some dubious heuristics to “guess“ whether a word is a CURIE, an IRI, or a label, and I don’t think it is worth the trouble.
  • Allowing quoted labels makes it even more important to bring an answer to What is the intended mechanism to deal with quote characters in values? #40 about escaping. Currently, the Lark grammar (and therefore, I presume, the Python implementation) does not allow escaping, meaning it is impossible to use a label that happens to contain a quote. KGCL-Java supports C-style escape sequences (making it possible to say, e.g. obsolete 'Wheeler\'s organ').
  • Should we allow the use of language tags to refer to a label in a specific label language (e.g. obsolete 'my bad term'@en)? I’d personally say no, I don’t think it is worth the trouble. If a term has several labels in several languages, we try all available labels regardless of the language.
  • What about synonyms? Should we allow to refer to a term using one of its synonyms? I can imagine that people would like to be able to do that (though it would be a bad idea in my opinion).

But there may also be use cases for preserving the "deferred dereferencing" in yaml/json serializations, and in the object model.

Here I don’t follow. What would those use cases be? Allowing labels directly in the object model is the opposite of making the use of labels a “surface feature” only. This would force all code dealing with KGCL to handle the possibility that any ID could in fact be a label that must be resolved before use, instead of isolating that case in KGCL parsers/renderers.

@gouttegd
Copy link

Support for using labels when identifiers would be expected is tentatively implemented in KGCL-Java in a separate branch. This is the “surface feature“ implementation, where everything is done in the parser.

@gouttegd
Copy link

gouttegd commented Sep 1, 2024

It just occurred to me that if we allow the use of labels when identifiers are expected, we can then solve the problem in #56 (allowing the creation of a new class without having to know its ID in advance) with a much more intuitive syntax.

Instead of doing this (as proposed in #56, and currently supported in KGCL-Java):

create class AUTOID:1 "Mammal"
create class AUTOID:2 "Dog"
create edge AUTOID:2 rdfs:subClassOf AUTOID:1

we could do something like this:

create class "Mammal"
create class "Dog"
create edge "Dog" rdfs:subClassOf "Mammal"

All that would be required is to amend the syntax of the create class command to allow it to be used with a label only instead of both an ID and a label (that is, instead of create class THE:ID "the label", we also allow create class "the label"). Then upon encountering an ID-less create class command, we automatically mint an ID (as described in #56) and whenever the same label is used in subsequent commands (as in the create edge command in the example above), we resolve the label into the newly minted ID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants