Fix schema for edge changes to clarify how to uniquely identify an edge #7

cmungall · 2021-07-17T01:59:21Z

The node change hierarchy makes use of an about field to indicate which node is being modified. Nodes have primary keys so it's easy to do things like:

c = NewSynonym(id='chg12345', about='ANAT:HindLimb', new_value='hindlimb')

It looks like I attempted to do something similar for edges, using an about field that points to an edge. However, edges don't have singular primary keys and we want to be able to refer to edges by SPO triple.

so the current model does allow this:

c = PredicateChange(id='chg12345', about=Edge(subject=FOO1:, predicate=..., object=..) new_value='hindlimb')

This has a natural YAML, JSON, JSON-LD, RDF serialization. However, due to the nesting it does NOT have a natural CSV serialization, and a core use case is being able to specify a set of changes in a spreadsheet

option 1: denormalize the core model

for edges, rather than about we would have 3 fields about_{s,p,o}

The first disadvantage is denormalization

The second is that it assumes every edge (OWL axiom) is uniquely identified by a SPO triple. But in fact that is neither true of OWL nor of the more generalized use cases.

I have quite a few ontologies where I have >1 axiom with the same SPO. This is useful for a number of reasons such as provenenace. Axiom 1 may have evidence E1, contributor C1, publication P1, and Axiom 2 may have evidence E2, contributor C2, publication P2. If these all get conflated to the same edge then we mix E1 with P2 and so on.

See the discussion here for further context on how we deal with this in robot: ontodev/robot#214

option 2: denormalize/flatten at time of mapping to CSV

here we would keep the normalized/nested datamodel, but when translating to and from CSVs we would flatten things such that the outer field is concatenated with the inner field

e.g.

id: chg123
description: I am changing predicate because blah blah
about:
  subject: FOO:1
  predicate: P:1
  object: FOO:2
...

==>

d: chg123
description: I am changing predicate because blah blah
about_subject: FOO:1
about_predicate: P:1
about_object: FOO:2

(as a generalized algorithm this only works when the containing slot is singlevalued)

now, technically the SPO may not be unique here. We can imagine two modes:

unambiguous: if your about selector does not uniquely identify a single edge, fail
global: the change is applied to all edges that match

This has some nice properties. E.g. most graphs have maximum one edge between them, so P doesn't necessarily do much work as disambiguator, so we could say

"change predicate to subClassOf in edge between CNS and nervous system"

The text was updated successfully, but these errors were encountered:

Previously this was undelcared; this worked for nodes, which are not inlined. however, edges must be inlined (see #7) so this was not working. This change simply changes a line of theyaml schema to declare the range, and includes other regenerated objects

cmungall closed this as completed Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix schema for edge changes to clarify how to uniquely identify an edge #7

Fix schema for edge changes to clarify how to uniquely identify an edge #7

cmungall commented Jul 17, 2021

Fix schema for edge changes to clarify how to uniquely identify an edge #7

Fix schema for edge changes to clarify how to uniquely identify an edge #7

Comments

cmungall commented Jul 17, 2021

option 1: denormalize the core model

option 2: denormalize/flatten at time of mapping to CSV