-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY #93
Comments
Comment by Lee Belbin (@Tasilee) migrated from spreadsheet: |
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: |
Noted in call 2022 Feb 13: Need to be explicit about cases where year is provided by month/date contain values not interpretable to a day or a month. |
Example cases to be explicit about in the test data: dwc:year="2021", dwc:month="", dwc:day="29" dwc:year="2021", dwc:month="X", dwc:day="29" Specification so as both of these result in dwc:eventDate="2021" (or 2021-01-29/2021-12-29). |
Per discussion on TG2 call 2022 Mar 6, added word unambiguous to expected response, such that X is not interpreted as 10, as it could be [missing data], and per suggestion by @tucotuco added guidance to use only year if just day and year are present. |
… specifications. DESCRIPTION: Updating implementation of AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY to fit current (2022-03-08) specification, making method name consistent, and deprecating old method. Adding a utility method romanMonthToInteger to interpret roman numeral month values as numbers. Adding/updating relevant unit tests.
… specifications. DESCRIPTION: Updating implementation of AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY to fit current (2022-03-08) specification, fixing unit tests to conform with current specifications.
To handle the issues we've been having fitting an implementation from the specification to the test data, I suggest adding aome clauses about interpretability to the notes. From: An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. If dwc:year and dwc:day are present, but dwc:month is not supplied, then just the year should be given as the proposed amendment. To: An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. If dwc:year and dwc:day are present and interpretable, but dwc:month is not supplied or is not interpretable, then just the year should be given as the proposed amendment. |
… specifications. DESCRIPTION: Conforming the implementation and unit tests for AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY to the expectations expressed in the validation data for handling uncertainty. This code passes validation dataID 994 (and all the other TIME test cases in v19 of the validation data except for dataID 987 which probably has a typo in the validation data).
Notes amended accordingly. |
Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16. |
…ENDED when proposing changes to empty terms. Updated method, tests, and comments.
Based on @chicoreus email August 31, then INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is uninterpretable as a valid year; FILLED_IN the value of dwc:eventDate if an unambiguous ISO 8601-1:2019 date can be interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED | becomes INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is uninterpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1:2019 date can be interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED | ? |
Expected Response updated as per above with minor edit- INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1:2019 date can be interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED |
More discussion today on this suggests we can interpret dwc:month from Roman numerals and @ArthurChapman said that using Roman numerals for month is not unusual, so we have amended the examples to illustrate this principle. The test data records have been changed accordingly and we will add a Vocabulary item for "Roman numerals". |
This test currently (from the Notes) has dependencies - i.e. one should use dwc:verbatimEventDate, as a priority, or dwc:startDayOfYear and dwc:endDayOfYear, before attempting to run this test. We have wanted all the tests to be stand-alone and thus, I think the Specification should be rewritten as follows INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate, dwc:verbatimEventDate, dwc:startDayOfYear and dwc:endDayOfYear, are bdq:NotEMPTY or dwc:year is bdq:EMPTY or is not interpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED. Also #132 should also be rewritten similarly (q.v.) My other question is what is meant as a "valid year" - would this exclude "2035" for example? - and what about "1034" that may be an error for "2034" That would mean that we should then run #84 after this test. |
@ArthurChapman, doesn't feel like a good idea. That creates an
explicit dependency between the tests integral to the test, rather than
keeping the tests independent and making any potential
interdependencies a concern for the composition of sets of tests in
their execution framework. We discuss this in the implementation
guide.
|
@chicoreus - this reduces the dependency by making it explicit. Currently, in the Notes we say "An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. " By putting it in the INTERNAL_PREREQUISITES_NOT_MET - you don't run it if there is something in those fields. |
@ArthurChapman, exactly, by putting it in the notes we allow users to
compose the tests in other ways if they wish. By putting it into the
specification we prevent it from being used in other ways.
If we wish to provide an enforcable explicit ordering of the tests,
then we provide another test which runs these three in sequence. Again
see the discussion about independence and ordering in the implementation
guide. The statement in the notes is exactly why we don't want to
bring these limitations into the specificaiton of the tests itself.
|
Strange as it seems, I fully agree with @chicoreus in relation to the tests to be presented as independent, other than the Validate-Amend-(Re)Validate process. This is what I said to @ArthurChapman on this issue. As soon as we prescribe dependencies, we open a Pandora's box, and as @chicoreus pointed out, possible workflows are many (exemplified by a domain using a subset of tests). Thanks also @chicoreus for clarifying what a 'interpretable year' is. This is what I was looking for in relation to edge cases for the Test Data. Note that the previous Expected Response said INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED. Do we mean 'interpretable as a valid year' or 'interpretable as a valid ISO 8601 year' as they are different. I presume the former?Note that 'valid' would supersede 'interpretable' in that if it is a 'valid year' sensu ISO 8601, it will be interpretable? Scenario 1: dwc:year="1505" or "2035" are interpretable (as integers), and both are valid ISO 8601 years (0-9999 are ok), so the response.status given the Expected Response above would be NOT_AMENDED. Right? Scenario 2: dwc:year="-100" or "10001" may be a valid year (as it is an integer) but it is not a valid ISO 8601 year (0-9999 are, at least without "prior agreement"). Scenario 3: dwc:year="CX" is not an integer, the response.status should be 'INTERNAL_PREREQUISITES_NOT_MET? As @chicoreus wants to 'short circuit' an evaluation of a Date (dwc:eventDate) if dwc:year is in some way invalid (based upon the priority of dwc:year one presumes), should we use INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED or do we intend to limit dwc:year to a greater extent as in INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not a valid ISO 8601 year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED ? |
@Tasilee wrote Am I missing something - wouldn't they both be FILLED_IN? |
@ArthurChapman - Yes, my bad: FILLED_IN would be correct if 'valid' refers to an integer. |
@Tasilee I'd be happy with either (taking your two above, and adding "interpretable as" (we can't assert that something that is likely to be serialized as a string is a different data type, only that it can be interpreted as such: INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED or INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as a valid ISO 8601 year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED Note that specifying 1806-1 forecloses the use of the Library of Congress extension for uncertainty in dates (that is one of the things that needs to get pushed as a change in Darwin Core but it is currently specified in a comment about best practice "Recommended best practice is to use a date that conforms to ISO 8601-1:2019."), we'd be more change resistant to specify an ISO 8601 date in a normative assertion. I'd prefer the first option, this corresponds to the current event_date_qc implementation, and is simpler for implementors (just use a native type conversion function to see if the presented dwc:year in a probably string representation can be converted to a native integer data type, rather than doing that and then evaluating if the integer conforms to the ISO range of integers for year). Thus: INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED |
Some time back, I argued that we should just use ISO 8601 in a lot of our tests in the light of ISO 8601-1:2019 and the possibility of other extensions. |
This has been a useful discussion in raising the need to be specific and concise in our test Specifications/Expected Responses. Changing the Expected Response from INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is bdq:NotEmpty or dwc:year is bdq:Empty; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date is interpretable from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED to INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED @ArthurChapman: ISO 8601 is the bdq:sourceAuthority for only two tests: #26 and #76 and it is odd that it is not listed as such here or in the tests #36 Is there any logic to this? Did we agree that we would universally use "bdq:sourceAuthority" in the Expected Responses/Specifications (as in #26 and #76)? |
Looking back on my notes - especially under #26, I think we recommended ISO 8601-1 which covers the Basic Rules. The latest version is 2019, and this was amended in 2022 for some minor technical corrections. I think that if we use ISO 8601-1 we cover the latest versions with restricting ourselves to the 2019 version if there are later updates. ISO 8601-2 cover extensions that don't apply to our data. If we just use ISO 8601 we bring in all earlier versions which don't cover everything we need. So my conclusion is that we use ISO 8601-1 for all tests. |
Given my last post - the most accurate representation is ISO 8601-1:2019(en). I would add that into the References, but in the Expected Response leave as ISO 8601-1 That allows updates of the test in the future without making changes to the Expected Response/Specification. i.e. as above. I would follow #26 and include 8601-1 in the Expected Response as easrlier versions of 8601 didn't include some of the criteria we use. |
On Sat, 14 Sep 2024 16:37:07 -0700 Arthur Chapman ***@***.***> wrote:
ISO 8601-2 cover extensions that don't apply to our data.
Unfortunately not true. ISO 8601-2:2019 includes the Extended Date/Time Format (EDTF) extension, which allows for explicit statements about uncertainty. With ISO 8601-1, we can assert periods of time, such as 2020, or 2022-12, but these, under the meaning of the core, are actually durations that cover the entire interval. Almost always when biodiversity data asserts a collecting event date of say 1880, the intended meaning is on some unknown date within the year 1880, using ISO 8606-1. we can only express that the event was all of 1880. The EDTF extension, which is sorely needed in Darwin Core, allows explicit specification of uncertainty, with 1880-??-?? explicitly meaning at some unknown point within 1880, not a period of time covering the entire duration of 1880. We find ourselves, in 2024, unable to share data that MUSE could accurately represent in 1990. (where MUSE used explicit * markers for unknown values in yyyy-mm-dd).
Additionally, the statement in dwc:eventDate "Recommended best practice is to use a date that conforms to ISO 8601-1:2019." is in a non-normative element, the Notes.
We can put this into a normative element in the test specifications (expressing the explicit intent that that people need to be following the non-normative recomendation for the data to have quality), and we've deliberately done that in some cases.
Here is a case where we should be hesitant. We think the non-normative guidance in Darwin Core is too narrow (by not recomending the EDTF extension), so we shouldn't put ourselves in a place of needing to make a normative change if Darwin Core adopts a broader recommendation that we think it should adopt...
|
....so
|
Do we thus need to put both ISO 8601-1:2019 and ISO 8601-2:2019 in some of the Specifications? |
On Sat, 14 Sep 2024 17:28:52 -0700 Arthur Chapman ***@***.***> wrote:
Do we thus need to put both ISO 8601-1:2019 and ISO 8601-2:2019 in
some of the Specifications?
Or can we use just ISO 8601?
|
I'm still awaiting a decision about my two issues above. On my first point, I'd be happy with "ISO 8601". Second point, we either need to 'hard wire' "ISO 8601" into the Specifications or use "bdq:sourceAuthority". I can see either being ok, so I just seek consistency. This is also something (with Parameter) that I'd be happy to add to the Supplement document. |
I don't think we can test for quality without simply supporting ISO8601-1. I don't think it should be a source authority, as it does not provide anyone with a lookup. Any implementation that followed any other standard (is there one?) would be completely different to the point where those tests would have to be distinct tests. Alas we will be forever saddled with data that say 1820 and actually meant sometime during 1820, but at least it will support those who can explicitly say it correctly with 1820-??-??. |
On Sun, 15 Sep 2024 18:32:12 -0700 John Wieczorek ***@***.***> wrote:
I don't think we can test for quality without simply supporting
ISO8601-1.
....
Alas we will be forever saddled with data that say 1820 and actually
meant sometime during 1820, but at least it will support those who
can explicitly say it correctly with 1820-??-??.
Except we would have to include ISO 8601-2 to support 1820-??-??, that
extension isn't supported within just ISO 8601-1
|
Sorry, yes, I meant ISO8601 with no dashes. Just so used to the old one it comes out without thinking. |
OK, thanks @tucotuco and @chicoreus. So, I will edit #26 and #76 to replace "bdq:sourceAuthority" with "ISO 8601" and ensure all the other tests noted above follow that template. |
My question remains whether there should be a bdq:sourceAuthority at all. |
I can't remember or original reasoning for bdq:sourceAuthority. (seemed a good idea at the time?). Removing lookups makes Specifications easier to understand but it would cascade the work. I presume it would be good to replace the current Reference to the Source Authority with the two-part template we currently use under Source Authority, and remove all references to bdq:sourceAuthority...? |
That makes sense to me, but I have enough hesitation because of not remembering why there is a bdq:sourceAuthority for these to get consensus. |
From my memories of discussions - there are several cases where I don't think we need SourceAuthorities - they are where we don't have a Parameterized Test and where we link to an ISO Standard (or equivalent). This would apply to Dates where we reference ISO 8601 and tests like #272, #273, #274 (but that is parameterized with another Source Authority). Others reference the Country Code ISO 3166-1-alpha-2, but Source Authority may need to be kept - see Comments under #20 where there is mention of alternative references, thus there may be benefit it retaining the Source Authority for these tests. Note #62 and #48 which refer to ISO 3166-1-alpha-2 in the Source Authority, but not in the Expected Response. Other possible ones that could be considered are #38 (but that is complicated and probably needs retaining), and #133 (why is this one not as complicated as #38?) My view would be to do what you have in the TIME tests, but I don't see a strong case to alter those others. |
On Sun, 15 Sep 2024 19:23:21 -0700 Lee Belbin ***@***.***> wrote:
I can't remember or original reasoning for bdq:sourceAuthority.
(seemed a good idea at the time?).
I believe the reasoning was to try to generalize by removing all references to authorities from the expected response, leaving only bdq:sourceAuthority there.
This does make a lot of sense when the sourceAuthority might be parameterized. Here is a case where the expectation is that everyone will be following the same authority, the ISO date specfication (as opposed to some older obsolete specification, say that of RFC 0822). If we expect some people to wish to specify that eventDate data with a different format has quality for their use, then using sourceAuthority in the expected response, and specifying it with a default as a parameter would make sense.
Similarly, when we want to talk about an authority which has a name, and has a place where the authority can be looked up, and has an API endpoint, adding a sourceAuthority with the structure that gives this information makes sense.
Here is a case where simply stating the authority directly in the expected response makes a lot of sense. There aren't alternatives that would apply for dwc:eventDate, we aren't suggesting it be parameterized, and there isn't an API endpoint (or a publiclly accessible standard document).
Removing lookups makes Specifications easier to understand but it
would cascade the work. I presume it would be good to replace the
current Reference to the Source Authority with the two-part template
we currently use under Source Authority, and remove all references to
bdq:sourceAuthority...?
I don't think there is a good case for doing this for the ISO 8601 references. They fit naturally within the expected response without a clear need for adding the generalization and redirection of bdq:sourceAuthority.
Only argument for generalizing I can see is to declare the bdq:sourceAuthority to have a default of ISO 8601-1, add parameters, and allow users to specify an alternative of ISO 8601-2. Not a lot to be gained over simply specifying ISO 8601 in the expected response, and allowing implementors to support EDTF within 8601-2.
|
On Mon, 16 Sep 2024 15:08:28 -0700 Arthur Chapman ***@***.***> wrote:
Other possible ones that could be considered are #38 (but that is
complicated and probably needs retaining), and #133 (why is this one
not as complicated as #38?)
#133 only needs to point to the list of Creative Commons Licences as the standardization target.
#38 can specify a regular expression that encompasess all of the standard forms, it is one case where we can specify what values are in standard form with an explicit regular expression.
These are both fine as they are.
|
As suggested in yesterday's email, the consistency (and utility) of the use of bdq:sourceAuthority is now rational and consistent. |
The text was updated successfully, but these errors were encountered: