Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for simplified 'when' properties #42

Open
docuracy opened this issue Feb 24, 2022 · 9 comments
Open

Proposal for simplified 'when' properties #42

docuracy opened this issue Feb 24, 2022 · 9 comments

Comments

@docuracy
Copy link
Collaborator

Specification requires one of in|earliest|latest, but the given examples omit these properties.

@docuracy docuracy added the invalid This doesn't seem right label Feb 24, 2022
@kgeographer
Copy link
Contributor

?? The large example does use {"in":___} within start and end expressions for all timespans. And the example within the "when" section of the spec uses all three. Maybe I misunderstand.

  "timespans": [
    { "start": { "in": "yyyy-mm" },
      "end": {
          "earliest": "-yyyy",
          "latest": "yyyy-mm-dd" }
    }
  ],
  "periods": [
    { "name": "Anglo-Saxon Period, 449-1066",
      "uri": "http://n2t.net/ark:/99152/p06c6g3whtg" }
  ],
  "label": "for a century during the Anglo-Saxon period",
  "duration": "P100Y"
}

@docuracy
Copy link
Collaborator Author

Ah, so they do - but these smaller examples do not:

https://github.com/LinkedPasts/linked-places-format#geometry-required
https://github.com/LinkedPasts/linked-places-format#relations-optional

@kgeographer
Copy link
Contributor

Yes indeed! I will repair that oversight right away. However, I have wondered whether alternate forms for the when object might be considered, for example flattened. Maybe deserves a separate issue.

@docuracy
Copy link
Collaborator Author

docuracy commented Feb 25, 2022

[UPDATED]

I wondered about that too, mainly to improve the human-readability/comprehension. What we're looking for (in addition to the existing specification so as to preserve backward-compatibility) is a single-string expression that could serve as a value either for a timespan or for the whole of a when.

ISO 8601 already allows the following:

/ timespan delimiter (which we might also use as an end-date indicator when no start-date is given)
- BC/BCE
+ AD/CE (compulsory only for years with more than 4 digits)

ISO 8601-2:2019, as outlined here adds the following:

~ approximately (must precede any '-' or '+' token)
? unknown (discouraged if an approximate date can be given)

All we'd need to add to that are the following:

> after
>= earliest
< before
<= latest

Valid values would include:

Between (and during) 1939 and 1945: {"when": "1939/1945"}
Started after 2010: {"when":">2010"} or {"when":">=2011"}
Not before 1 April 1999: {"when":">=2010-04-01"}
Over by the end of June 1660: {"when":"/<=1660-06"} (note use of '/' delimiter to indicate an end date)
Ended, but not known when: {"when":"/?"}
Start unknown, continuing in 2022: {"when":"?/>=2022"}
Stonehenge: {"when":"~-2500/>=2022"}
Earth: {"when":"~-4543000000/>=2022"}
Lord Lucan: {"when":"1934-12-18/?"}

If no end-date is given, a timespan would, if possible, be deduced from the start-date. For example:

Some date in 1331: {"when":"1331"} is equivalent to {"when":">=1331-01/<1332"}
Some time on 18 December 1934: {"when":"1934-12-18"} is equivalent to {"when":"1934-12-18/<1934-12-19"}

However, no timespan can be deduced from qualified start-dates such as {"when":">2010"}.

According to Wikipedia, the specification ought to describe handling BC/BCE dates, and set a maximum limit on the number of digits used to represent a year. If more than 4 digits are used, ISO seems to dictate that a '+' symbol is compulsory to indicate AD/CE.

Other Proposals:

  • The ISO 8601 standard allows for other ways to represent dates, but for our purposes we could limit the specification to the YYYY or YYYY-MM or YYYY-MM-DD (plus extra 'Y's) formats in the above examples.
  • We should probably disallow time elements because they are seldom needed in cultural heritage and would defeat the readability goal. They are permitted in the existing when specification, which currently also allows any ISO 8601 expression, but should perhaps in that situation be limited to date+time expressions.
  • Allow up to 10 digits for the year (nothing older than the Earth by which a 'year' is defined?).
  • Allow this dates string format as a valid value for either when or timespans properties.

All of these proposals are now implemented in the draft JSON Schema.

Validating Regex: /(?:^(?<start>(?:[<>]?=?~?(?:(?:0|-?[1-9][\d]{1,3}|[+-]{1}[1-9][\d]{4,9}))(?:-(?:0[1-9]|1[0-2]))?(?:-(?:[0-2][1-9]|3[0-1]))?)|\?)?(?:\/(?<end>(?:[<>]?=?~?(?:0|-?[1-9][\d]{1,3}|[+-]{1}[1-9][\d]{4,9})(?:-(?:0[1-9]|1[0-2]))?(?:-(?:[0-2][1-9]|3[0-1]))?)|\?))?){1,2}$/g

@docuracy docuracy changed the title when.timespans.start|end.in|earliest|latest Proposal for simplified 'when' properties Feb 25, 2022
@docuracy docuracy removed the invalid This doesn't seem right label Feb 25, 2022
docuracy added a commit to docuracy/Locolligo that referenced this issue Feb 26, 2022
docuracy added a commit to docuracy/Locolligo that referenced this issue Feb 26, 2022
@rybesh
Copy link
Contributor

rybesh commented Feb 26, 2022

I'm wary of inventing new temporal expression syntaxes for LPF. I would be inclined to stick with standard xsd:date and xsd:dateTime values, and integers for years, otherwise we are sacrificing the ability to sort and query on time in most databases.

We could potentially allow extended ISO 8601-2 date expressions, with the understanding that not all implementations are required to be able to parse these.

More complicated needs could be met by layering additional temporal modeling constructs on top of what LPF expects. IMO LPF should be a "lowest common denominator" exchange format rather than a general purpose temporal modeling language.

We should probably disallow time elements because they are seldom needed in cultural heritage

I'm not sure this is a good assumption; cultural heritage can be of recent vintage too and may have times associated with artifacts or events.

@docuracy
Copy link
Collaborator Author

Yes, I like your wariness, Ryan. However, under this proposal 'complicated needs' would still be met by the existing when specification: the idea is to offer an alternative to that 3-tier JSON object which can easily be grasped by anyone typing data into a spreadsheet. I know that I'm far from alone in finding the existing when specification the most difficult aspect of LPF when trying to wrestle data into conformity, to the point where I'm often inclined not to bother.

I suggested dropping time elements from the proposed 'uncomplicated needs' format partly because I've never encountered them: I'd like to know if others in this group have seen them more than very occasionally. Typing xsd:dateTime values into a spreadsheet is hardly uncomplicated and they're not easy on the eye, but I suppose if somebody wants to do it there's no harm in allowing it.

I realise I may be missing a bigger picture, but I don't think we'd be sacrificing any ability to sort and query on dates because in order to get .lp.json into a database it has to be parsed, and any parser might easily be configured to remove and convert <>= characters into their latest|earliest|in equivalents, and to handle the remaining ISO 8601-2 expressions too. Do we maintain a list somewhere of implementations (other than Peripleo) that support LPF?

@rybesh
Copy link
Contributor

rybesh commented Mar 7, 2022

I don't think we'd be sacrificing any ability to sort and query on dates because in order to get .lp.json into a database it has to be parsed, and any parser might easily be configured to remove and convert <>= characters into their latest|earliest|in equivalents, and to handle the remaining ISO 8601-2 expressions too.

LPF is also RDF, and ideally it could be loaded into a triplestore and queried using SPARQL without having to install specialized SPARQL extensions for parsing non-standard date expressions. Even handling ISO 8601-2 is a tall order, especially at Level 2.

@VincentDucatteeuw
Copy link

VincentDucatteeuw commented Mar 7, 2022

All we'd need to add to that are the following:

> after
>= earliest
< before
<= latest

Perhaps of interest to this proposal are the properties within CIDOC CRM that are used to formalize time as implemented in RDF. (cfr. recording dates p.4) I myself use these properties to date places within a gazetteer.

We should probably disallow time elements because they are seldom needed in cultural heritage and would defeat the readability goal. They are permitted in the existing when specification, which currently also allows any ISO 8601 expression, but should perhaps in that situation be limited to date+time expressions.

I am somewhat hesitant about removing the time component. The CRM model I use for a gazetteer also doesn't theoretically require a time component, but practically I have a different view on this. Especially when we talk about the concept of place and the need for place disambiguation it can be very crucial to have a time component as it facilitates place identification. So I wonder to what extent it is not possible to date a place, even if the dating is very limited. E.g. In my project, the smallest possible dating of a place (i.e. after/before) is based on the dating of the mentioning of that place within a source. But it would be very interesting to discuss the possible removal with the broader group.

@docuracy
Copy link
Collaborator Author

docuracy commented Mar 7, 2022

Thanks, Vincent. If I've understood the CIDOC CRM correctly, this is more or less what Karl has implemented in the existing LPF: what I'm proposing is an optional alternative format for dates that encapsulates the wordy English expressions with internationally-understood symbols, so for example P82a_begin_of_the_begin becomes >=. My suggestion to disallow time elements applies only to this optional alternative string format for the sake of simplicity: any representation that needs greater granularity of time than just a date could still use the original standard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants