Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add series of date predicates that are neither start nor end #771

Open
rjyounes opened this issue Nov 4, 2022 · 35 comments
Open

Add series of date predicates that are neither start nor end #771

rjyounes opened this issue Nov 4, 2022 · 35 comments

Comments

@rjyounes
Copy link
Collaborator

rjyounes commented Nov 4, 2022

Currently atDateTime is bifurcated at the highest level between start and end. There are many circumstances when you want to indicate the date something happened without a sense of starting or ending - e.g., a publication date, the date something was approved or archived, etc. One must either assign the same datetime as both start and end, or use a convention of, say, always using an end date. Both of these solutions are contrived and reveal a semantic hole.

Of course, one must be careful not to overuse a single date but think about the actual meaning of the date. E.g., the founding date of an institution should be modeled as a start date; on the other hand, to model a publication date as the "start date" of a text being a published version pushes the boundary.

@rjyounes rjyounes added impact: minor New, backward-compatible functionality (does not change inferences; e.g., adding a term) topic: dates and times labels Nov 4, 2022
@justin2004
Copy link
Contributor

to model a publication date as the "start date" of a text being a published version pushes the boundary.

I feel like I would express something like:

:content0 a gist:Content .
:publication0 a gist:Event ;
  gist:hasParticipant :content0 ;
  gist:hasParticipant wd:Q26535 ;   # Journal of High Energy Physics 
  gist:actualStartDate '2021-06-01T00:00:00-6:00'^^xsd:dateTime .

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 4, 2022

This is certainly semantically correct and may be useful in some contexts, but it has to be weighed against the consideration of simplicity of both the model and the resulting data graph. I have no need for the publication event other than to state the date: I don't need to say who published it or where, for example. And in addition to the publication date, there may be a distribution date, archive date, and possibly others, so minting events in each of these cases seems to me unnecessarily verbose. But this does raise an interesting question, and I'd be interested to hear what others have to say about resolving the tension between simplicity and semantic completeness.

There is a precedent for the simpler version in gist: we model a birth date as a date attached to a person, rather than minting a BirthEvent that has a date or dates and two or more participants. Other examples where we likely want this simpler usage: the date on a form I fill out and sign (a SigningEvent), the date a painter puts on the back of a painting (which is not necessarily the end date of the painting process, so requires its own event), creating an entry in a database, a vaccination date on my medical record, the expiration date on a medication, etc. I do agree that if we have other things to say about these events, we will probably want to create them.

Another point: unless you are going to add an end date to the event, you have merely pushed the issue back to a different object - from the published thing to the publication event. That is, are you using start date to mean date, which is the practice I object to?

@uscholdm
Copy link
Contributor

uscholdm commented Nov 4, 2022

One must either assign the same datetime as both start and end, or use a convention of, say, always using an end date. Both of these solutions are contrived and reveal a semantic hole.

Yes they are, and it does and its confusing as well. I would be in favor of introducing a property called atDate when you don't care about start or end.

@justin2004
Copy link
Contributor

justin2004 commented Nov 4, 2022

here is a precedent for the simpler version in gist: we model a birth date as a date attached to a person

I think that is an application-centric flavored projection (a fused edge).

It runs roughshod over the event itself.

I think I would say Alice's birth date something like:

[ a gist:Event ;                                                                                                                     
  :hasRole [ gist:categorizedBy wd:Q576104 ;  # neonate                                                                                                  
                   :playedBy :Alice ] ;
  gist:actualStartDate '2021-06-01T00:00:00-6:00'^^xsd:dateTime ]

I think gist might need 2 new primitives for roles so that we can handle this kind of stuff with points of articulation:
#695

@justin2004
Copy link
Contributor

justin2004 commented Nov 4, 2022

Another point: unless you are going to add an end date to the event, you have merely pushed the issue back to a different object - from the published thing to the publication event. That is, are you using start date to mean date, which is the practice I object to?

That's a fair point but but at least the discussion is about the correct node: the event! :)

Actually I think introducing the Event helps us think about the situation more clearly. Do we mean the publication event (like breaking a bottle over the bow of a ship) or do we mean the extended event where the content is available in the journal. I bet journals have "unpublished" content before.

@mkumba
Copy link
Contributor

mkumba commented Nov 4, 2022

If you feel the need to have a date that represents the start and end (I tend not to, but I could see the possibility) then I would suggest instead of having atDateTime at the top of the tree, where it is now, and is ambiguous, I'd say make it a sub property of both startDateTime and endDateTime, removing the ambiguity.

If you say atDateTime, you mean it started and ended at that time. Querying on end will still get this (which it won't in the current arrangement)

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 6, 2022

@justin2004 I think you are too quick to label a modeling approach as application-centric without considering that semantic models are constructed to address certain business needs, and therefore represent certain types of data and not others. That's why the answer to a modeling question is so often "It depends..." with the implied completion "on the business questions and context" and not "on your application." In a typical business context you want to know someone's birthdate, period.

I think it would be overmodeling to materialize a birth node for each of the, say, 100K employees in a client's administrative systems, including HR, finance, and benefits. If I were modeling a hospital obstetrics department, most likely I would need a birth event along with many participants, such as the baby, mother, father, midwife, nurse, obstetrician, anesthesiologist, etc. And the obstetrics department records the birth date as a full datetime, whereas in the hospital HR system they record employees birthdates as dates only. This has nothing to do with application-centricity but with the knowledge domain and business needs being addressed. In fact, to take the argument one step further, you might (I'm not saying I would) even say it was not only unnecessary but wrong to model the HR birthdates using birth events, because it suggests that these events are of concern in this context, which they are not.

Getting back to my original publication date case, you may turn out to be right: it's possible that I will have other things to attribute to the publication and thus would want to instantiate it as an event.

@justin2004
Copy link
Contributor

justin2004 commented Nov 7, 2022

too quick to label a modeling approach as application-centric without considering that semantic models are constructed to address certain business needs

A specific business need that is know today is often addressed with a specific application.

Business need known today ~= application

I think a drawback of letting your business need today influence your domain modeling too much is that:

-the data harmonization across the enterprise will be diminished
e.g. in this domain you have to do

?s :birthDateTime ?bd

in that domain you have to do

?role :playedBy ?s .
?role :categorizedBy :neonate .
?event :hasRole ?role .
?event :atDateTime ?bd . 

-the first domain you model (for a specific business need) might influence domain modeling choices in other (later) domains with undesirable consequences

-the reproduciblity of the domain modeling is diminished
that is, if we give very similar input (ontology(ies) + data + SMEs), but for a different business need, to a different ontologist we likely won't get the same model

It seems me to like the ideal thing to do is have one big subjective decision: selecting an upper ontology.
After that the domain modeling should be mostly mechanical: the resultant RDF should mostly follow from the ontology + the situations that the data represent.
I worry that if that isn't how it works then too many different (and incompatible) things could count as acceptable.
Do you agree with that?

@DanCarey404
Copy link
Contributor

DanCarey404 commented Nov 7, 2022

gist is our upper ontology. And it is a business ontology. It is not intended to model the entire world. As you can see from this discussion alone, selecting an upper ontology does not make domain modeling mechanical. Nor should it, as that stifles innovation. There are patterns we follow now that we simply didn't have 5 years ago. We likely will tumble onto better patterns in the future.

Also, Semantic Arts is in the business of helping businesses. Our models must serve their ends, not the other way round. The point about introducing Event instances into a KG is interesting. It is a pattern we can point to, to see if the client has the extra data to support it and see the value in having it. But we should not insist on it. If a definite need arises at a later point in time, it is easy enough to then create the Event instances using, say, :publishedAtDateTime. But if it doesn't serve the client's business needs, then it strikes me as a non-starter.

@DanCarey404
Copy link
Contributor

To return to the subject of the issue....
We have an msCore property, mscore:lastModifiedAtDateTime, that I think would not fall under :start or :end.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 7, 2022

Very well expressed, @DanCarey404. I agree 100%.

I think it's safe to say that the HR department at company X will never need to know anything about the event of someone's birth other than when it happened. And, as Dan says, it could be added in later if the HR department starts recording the names of the doctors who delivered their employees (to be distinguished from the recruiters who delivered them).

@uscholdm
Copy link
Contributor

uscholdm commented Nov 7, 2022

If you feel the need to have a date that represents the start and end (I tend not to, but I could see the possibility) then I would suggest instead of having atDateTime at the top of the tree, where it is now, and is ambiguous, I'd say make it a sub property of both startDateTime and endDateTime, removing the ambiguity.

This is a very interesting idea. I wonder if it creates any problems

@rjyounes rjyounes removed the impact: minor New, backward-compatible functionality (does not change inferences; e.g., adding a term) label Nov 7, 2022
@DanCarey404
Copy link
Contributor

In my experience, :atDateTime never actually gets used in data. Rather its more meaningful subproperties are the ones that get used. It serves more as an organizing placeholder. Perhaps we should add an annotation suggesting that it not be used.
Also, it just seems to me that :startAtDateTime and :endAtDateTime are naturally subproperties of :atDateTime. Going the other way around just strikes me as odd.

@mkumba
Copy link
Contributor

mkumba commented Nov 7, 2022 via email

@justin2004
Copy link
Contributor

I think it's safe to say that the HR department at company X will never need to know anything about the event of someone's birth other than when it happened.

@rjyounes
If we already know all of company X and/or department Y's needs (current and future) then they don't need graph databases and domain models that have formal grounding.
However, if company X says they mostly know their current needs but they want to have the option to flexibly/nimbly follow and create opportunities then I think they need a graph database and domain models that have formal grounding.

As you can see from this discussion alone, selecting an upper ontology does not make domain modeling mechanical. Nor should it, as that stifles innovation.

@DanCarey404
Given the same inputs if the domain modeling doesn't produce the same or highly consistent RDF then that "innovation" is a problem for people and systems interrogating the data.
Because if the System A needs to talk about birth events and System B doesn't need to talk about birth events then query writers have to produce two distinct queries to find all the births. What if you have 30 systems, with various current needs, representing similar situations in ways that were sufficient for their particular need? I think what you have is a situation when a request comes in then the query writer is going to have to roll up her sleeves for a couple days.

I think "innovation" above is really "rogueness."
If the ontology + the situations to be expressed don't lead you mechanically to representations then you have the opportunity for rogueness (rogue ontologists).

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 7, 2022

We actually do have such an annotation: "This is the top level property for asserting time, and is not expected to be asserted directly."

I think the solution (while extremely ingenious) creates the following inferencing problem: if some event starts at time x and ends at time y, then I think we want to say it occurs at every point in time from x to y, inclusive. If this is true, then we cannot infer startDateTime or endDateTime from atDateTime. The way around this is to say that atDateTime means "at this time and no other" - but is that what what we want it to mean?

I see Dave said essentially this above.

@uscholdm
Copy link
Contributor

uscholdm commented Nov 7, 2022

Because if the System A needs to talk about birth events and System B doesn't need to talk about birth events then query writers have to produce two distinct queries to find all the births.

You can write a single query using UNION.

@justin2004
Copy link
Contributor

You can write a single query using UNION.

Coming up with the triple patterns for each system is the effort though.

@uscholdm
Copy link
Contributor

uscholdm commented Nov 7, 2022

There are always tradeoffs. For the people who don't care about the event, if it is forced on them, then every query is more complicated and confusing. The query writer may be thinking, I don't care about the event of the birth, I just want the birthdate, thank you very much. That simpler representation can be extended to an event based one where the simple link becomes a property chain from the person through the event of their birth to the date. People get all and only what they need.

@justin2004
Copy link
Contributor

justin2004 commented Nov 7, 2022

Using property chains (to give query writers formally defined shortcuts) is ideal, I agree.

But look at this representation:

[ a gist:Event ;                                                                                                                     
  :hasRole [ gist:categorizedBy wd:Q576104 ;  # neonate                                                                                                  
                   :playedBy :Alice ] ;
  gist:actualStartDate '2021-06-01T00:00:00-6:00'^^xsd:dateTime ]

It would take more than a property chain to directly connect :Alice to her birthdate -- it is a "qualified" property chain.
@uscholdm Do you know of a way to formally define something like a qualified property chain? Then we could make everyone happy!

@uscholdm
Copy link
Contributor

uscholdm commented Nov 7, 2022

That's because you introduced another ocmplexity that is my experience is rarely needed: explicit instances of roles. 99% of the time, representing a role as a property works just fine. Take a loan agreement. THere are two main roles: borrower and lender. Two propetties suffice to represent these roles: hasLender and hasBorrower (irrespectively).

So now we have two additional layers of complexity, which are only needed some relatively small percentage of the time: explicit events and explicit roles.

I never stumbled across the idea of a qualified property chain, but to be clear, in this exchange, I'm talking about property chains in SPARQL, not in OWL. The latter has so many shortcomings that I long ago abandoned using them. I explain this in detail in a short section in my book.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 7, 2022

Even if such a thing existed, it would not make everyone happy. Using OWL property chains, you still need to materialize the underlying structure to be able to infer the shortcut property. This is what we're objecting to to begin with. Using property paths in SPARQL, as Michael suggests, does not require this.

If we already know all of company X and/or department Y's needs (current and future) then they don't need graph databases and domain models that have formal grounding.
However, if company X says they mostly know their current needs but they want to have the option to flexibly/nimbly follow and create opportunities then I think they need a graph database and domain models that have formal grounding.

I agree in principle, within common sense limits. We never leave all possibilities open for every domain, or we would, as Dan says, have to model the entire universe. Does an HR model need to model molecules, in case the need arises to model the employees' individual molecules? Or should that be pushed up to the upper ontology, so that any domain can access the concept? Neither. It belongs in a domain model for molecular biology.

@justin2004
Copy link
Contributor

Using OWL property chains, you still need to materialize the underlying structure to be able to infer the shortcut property.

I'd be happy even if you didn't materialize the underlying structure. Just like I am happy when we use sub properties thoughtfully even though the reasoner isn't turned on. Having the formal definition is a great start.

Does an HR model need to model molecules

I don't think the slope is that slippery. A molecules class doesn't exist in gist but an event class does. I think the rule of thumb here is simple and natural: if you want to slap a date on something then that something is an event (":atDateTime :domainIncludes :Event" style).

I think that rule of thumb yields more consistent modeling patterns.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 8, 2022

The question of whether dates are a sufficient condition for being an event is another complicated issue; I don't want to add to an already long comment thread, and it's not fully on topic here, but let's just say I wouldn't take it for granted. Happy to talk about it sometime!

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 8, 2022

However, there's been a disconnect, because I had no idea that you were proposing to assert the shortcut property without building the underlying structure. That obviates a few of my objections.

There's still Michael's objections to property chain axioms, the foremost being, according to his book, that you cannot use cardinality restrictions with them (pp. 96, 180). The trade-off is if you use SPARQL property paths instead, the semantics is not expressed in the ontology per se.

@justin2004
Copy link
Contributor

justin2004 commented Nov 8, 2022

In @uscholdm 's book (page 179) I see "a transitive property may not be used with a cardinality restriction, it takes you out of OWL DL." And he explains the errors you get with a reasoner.

And then on page 180 "when creating a cardinality restriction ... you should avoid using a transitive property or a property defined using a property chain." Then on 181 "you cannot use a property chain with a cardinality restriction." But he doesn't say why though he does refer the reader to this.

I'll see if I can find the reason. In any case, I'm not sure that is a decisive reason to avoid owl property chains altogether.

The formal definitions (using owl property chains) could give users a shortcut without sacrificing the thoughtful atomicity.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 9, 2022

There is a distinction between the haUncle property chain discussed in Michael's book (or hasGrandparent in the OWL spec) and the birthdate case: in the former, you are defining a shortcut property over a structure that you already wanted; in the latter, you are defining a structure in order to get the shortcut property. Not sure how that affects the discussion, but it's worth noting.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 9, 2022

Going back to the option of making atDateTime a subproperty of both startDateTime and endDateTime, you lose the ability to concisely say that something has no dates via a maxCardinality restriction on atDateTime. You'd need two, one for each superproperty.

@mkumba
Copy link
Contributor

mkumba commented Nov 9, 2022 via email

@justin2004
Copy link
Contributor

Here I also see:
"NOTE: OWL DL requires that for a transitive property no local or global cardinality constraints should be declared on the property itself or its superproperties, nor on the inverse of the property or its superproperties."

I'm wondering if property chain axioms got pulled into this unnecessarily. Maybe this limitation only applies to them if they use a transitive property.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 10, 2022

I believe there is a simple, logical explanation for this limitation. Because both transitive properties and property chain axioms generate certain types of inferred triples, their cardinality cannot be restricted.

Let's say I have the following axioms and instance data:

:p1 a owl:TransitiveProperty .

:C a owl:Class ;
    rdfs:subClassOf [
        a owl:Restriction ;
        owl:onProperty :p1 ;
        owl:maxCardinality 2 ;
    ] ;
    .

[] a owl:AllDifferent ;
    owl:distinctMembers (:t1 :t2 :t3 :t4 ) ;
    .

:t1 a :C ; :p1 :t2 .

:t2 :p1 :t3 .

A reasoner infers:

:t1 :p1 :t3 .

Now I add the triples:

:t3 :p1 :t4 .

The reasoner infers:

:t1 :p1 :t4 .
:t2 :p1 :t4 .

and there is a logical contradiction because there are 3 triples with subject :t1 and property :p1. There is no way to restrict the cardinality on a transitive property, because even if the number of asserted triples falls within the stated cardinality, new triples can be inferred that may exceed it.

Property chains work the same way. Given:

:p1 owl:propertyChainAxiom  ( :p2  :p3 ) .

:C a owl:Class ;
    rdfs:subClassOf [
        a owl:Restriction ;
        owl:onProperty :p1 ;
        owl:maxCardinality 1 ;
    ] ;
    .

[] a owl:AllDifferent ;
    owl:distinctMembers (:t1 :t2 :t3 :t4 )
    .

:t1 a :C ; :p2 :t2 .

:t2 :p3 :t3 .

the reasoner infers:

:t1 :p1 :t3 .

Now I assert:

:t2 :p3 :t4 .

The reasoner infers:

:t1 :p1 :t4 .

and there is a logical contradiction.

The common thread is that a logically consistent set of assertions may lead to a logical contradiction due to the cardinality constraints on either of these types of properties.

@uscholdm
Copy link
Contributor

The common thread is that a logically consistent set of assertions may lead to a logical contradiction due to the cardinality constraints on either of these types of properties.

This is interesting, and could be correct, but I'm not convinced. Detecting a logical inconsistency just means that the assertions along with the rules creates a logical contradiction. Adding that new assertion makes an inconsistency. The reasoning leading to the contradiction could involve a chain of dozens of inferences. A first order logic theorem prover has no problem with this. I still think it more likely that it is a DL tradeoff. To be sure, I'd have to ask a DL inference engine expert.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Dec 9, 2022

In one of my client ontologies as well as in one of our sub-gists, I am seeing definitions for the following:

  • atDate - includes date precision but no start/end or planned/actual
  • startDate/endDate - includes date precision and start/end but no planned/actual

These are defined in the domain ontologies merely because they are not defined in gist, but have no special relevance to those particular domains. This is an indication that they should be moved up to gist.

@rjyounes
Copy link
Collaborator Author

rjyounes commented Nov 9, 2023

Another solution: set start and end dates to the same datetime.
But things such as publication dates are not intervals, they are points in time, so using start + end is not accurate.
Many examples of this.
We do have gist:isRecordedAt which functions as proposed.

Boris: OWL Time distinguishes points in time from intervals.
Rebecca: We eliminated TimeInstant in gist 11 based on a proposal made by Dave.

To reiterate the proposal in this issue: add a series of datetime predicates that express precision and planned/actual but not start/end. E.g.,
actualDate
actualMinute
etc.
plannedDate
plannedMinute
etc.
Requires 10 new predicates for all precisions with actual/planned distinctions.

Rebecca: Many that we care about are just dates: publication date, approval date, signature date.
Jamie: some approvals need more precision.

Do we need actual vs planned?
If not, we would have 4 new predicates.

We could start with these 4 and see how it goes.
Steven: Rebecca's use cases probably need planned vs actual. E.g., planned vs actual publication dates.
Jamie: Only makes a difference if you want to keep the history of the planned dates.
Rebecca: In some cases we do.

Dave defined atDate in one of the sub-gists (now moved to gistBusiness).
Jamie: Dave sometimes throws things in to come back to later.

Postpone decision till we get input from Dave and/or one or more senior ontologists.

@mkumba
Copy link
Contributor

mkumba commented Nov 10, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Triage
Development

No branches or pull requests

5 participants