Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix schema for edge changes to clarify how to uniquely identify an edge #7

Closed
cmungall opened this issue Jul 17, 2021 · 0 comments
Closed

Comments

@cmungall
Copy link
Collaborator

The node change hierarchy makes use of an about field to indicate which node is being modified. Nodes have primary keys so it's easy to do things like:

c = NewSynonym(id='chg12345', about='ANAT:HindLimb', new_value='hindlimb')

It looks like I attempted to do something similar for edges, using an about field that points to an edge. However, edges don't have singular primary keys and we want to be able to refer to edges by SPO triple.

so the current model does allow this:

c = PredicateChange(id='chg12345', about=Edge(subject=FOO1:, predicate=..., object=..) new_value='hindlimb')

This has a natural YAML, JSON, JSON-LD, RDF serialization. However, due to the nesting it does NOT have a natural CSV serialization, and a core use case is being able to specify a set of changes in a spreadsheet

option 1: denormalize the core model

for edges, rather than about we would have 3 fields about_{s,p,o}

The first disadvantage is denormalization

The second is that it assumes every edge (OWL axiom) is uniquely identified by a SPO triple. But in fact that is neither true of OWL nor of the more generalized use cases.

I have quite a few ontologies where I have >1 axiom with the same SPO. This is useful for a number of reasons such as provenenace. Axiom 1 may have evidence E1, contributor C1, publication P1, and Axiom 2 may have evidence E2, contributor C2, publication P2. If these all get conflated to the same edge then we mix E1 with P2 and so on.

See the discussion here for further context on how we deal with this in robot: ontodev/robot#214

option 2: denormalize/flatten at time of mapping to CSV

here we would keep the normalized/nested datamodel, but when translating to and from CSVs we would flatten things such that the outer field is concatenated with the inner field

e.g.

id: chg123
description: I am changing predicate because blah blah
about:
  subject: FOO:1
  predicate: P:1
  object: FOO:2
...

==>

d: chg123
description: I am changing predicate because blah blah
about_subject: FOO:1
about_predicate: P:1
about_object: FOO:2

(as a generalized algorithm this only works when the containing slot is singlevalued)

now, technically the SPO may not be unique here. We can imagine two modes:

  1. unambiguous: if your about selector does not uniquely identify a single edge, fail
  2. global: the change is applied to all edges that match

This has some nice properties. E.g. most graphs have maximum one edge between them, so P doesn't necessarily do much work as disambiguator, so we could say

  • "change predicate to subClassOf in edge between CNS and nervous system"
cmungall added a commit that referenced this issue Jul 17, 2021
Previously this was undelcared; this worked for nodes, which are not inlined.
however, edges must be inlined (see #7) so this was not working.

This change simply changes a line of theyaml schema to declare the range,
and includes other regenerated objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant