-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-VALIDATION_DATEIDENTIFIED_INRANGE #76
Comments
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: |
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: |
Same issue as #36 with ranges - specification should be consistent with that issue. |
I agree with @chicoreus and as I have commented on #66, "I would say #36 SHOULD cover a 'not possible' date within a valid time range." |
@chicoreus. Unless I am misreading you - this is not the same as #36. It doesn't include the complexity of eventDate - and can't be identified before it is collected or into the future - it has a more definite range than you have been indicating with eventDate surely? |
Agreed at TDWG 2018 DQIG meeting that the test should parallel TG2-VALIDATION_EVENTDATE_OUTOFRANGE in terms of an optional earlier limit. |
@ArthurChapman Since both dwc:eventDate and dwc:dateIdentified are expected to contain ISO dates, the complexities of both are the same (they are't the same as the complexities of the multiple temporal terms in Event). Identifications existing now can't have been made in the future (like collecting/observing events), but under some conditions identifications can be made before occurrence events. For example, long term monitoring of a particular individual organism may begin with the identification of the organism to species, and then a sequence of observations of that organism at different times (and for mobile organisms at different places) may be made after that identification. |
I flagged this as Needs Work because I am having problems implementing it from the given specification:
(1) No "default designated date" is not a defined concept, I don't know what to do with this. It sounds like it is a reference to the default values for bdq:earliyestDate and bdq:latestDate (to put the parameters into a bdq namespace), but these are defined values, so their absence would be a defect in the implementation not a test failure condition. I suggest we change the specification by removing "there is no default designated date or" to:
|
Again @chicoreus your reasoning seems sound. The Parameters including default values were added after the Expected Response(s) was written. At the time of writing the Expected Response - we didn't have a defined default. Now that we do (I think in all cases which I believe is important to stop lots of failures because someone forgot to set a default), I think your new wording is good. I am happy with the new wording. Your argument about identification prior to an event to me is a rather pedantic one. I am not sure that you can call it an identification if you don't (at the time) have something to identify. In the cases you mention - I would regard the identification as being simultaneous to the observation (event). If you are looking for a particular organism, then when you find it and pick it up, or "identify" it through observation, then that is when the identification took place. |
Thanks @chicoreus and @ArthurChapman: Well picked up. I will amend accordingly and would value a check. |
Checked @Tasilee. We did have another error we had in the Example 1573 rather than 1753 which I fixed. |
Thanks @ArthurChapman |
…/bdq#26 and VALIDATION_DATEIDENTIFIED_INRANGE tdwg/bdq#76 as having issues needing work on their specifications.
I presume "bdq:sourceAuthority is "ISO 8601-1:2019" [https://www.iso.org/obp/ui/]" is ok, or we use just "8601-1"? |
For consistency, I think we just use ISO 8601-1 |
Sorry if I missed a conclusion somewhere, but I worry that this is not
right. By not committing to a specific document, all differences between
versions within the ISO 8601-1 standard are open to interpretation. That's
not a good state to be in for a test. I think Paul mentioned something
along these same lines in one of the flood of recent conversations.
Darwin Core is specific, ISO 8601-1:2019.
…On Tue, Jun 13, 2023, 01:10 Arthur Chapman ***@***.***> wrote:
For consistency, I think we just use ISO 8601-1
—
Reply to this email directly, view it on GitHub
<#76 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ727GJ6BVK4LVPY2HHTTXK7RZXANCNFSM4EKSOVRA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@tucotuco may have been in an email thread, good to memorialize here. Darwin Core eventDate and dateIdentified are both specific in the non-normative comments/notes. A concern was that we would link a normative assertion here to a non-normative assertion in Darwin Core, and be forced to make a normative change if a non-normative change in Darwin Core referenced a different specific ISO 9601-1 version, thus our use of ISO 8601-1 rather than ISO 8601-1:2019 in the normative test specifications may be more resilent to change than using the specific ISO 8601-1:2019 would be. A counter argument, noted by @Tasilee is that there was a change between ISO 8601-1 version in the acceptability of 24:00 as a representation of midnight, with, if I recall correctly, ISO 8601-1:2019 specifying only 00:00 as midnight. I think I noted in an email thread that even though this relates to (out of scope for CORE) time, different java libraries handle DateMidnight differently, and accepting or not accepting 24:00 as midnight could cause complications for implementors (though for CORE, only on edge cases). We also do have normative elements where we, by design, expect data values to conform to best practice assertions that are made in non-normative elements of Darwin Core, as those are places where we assert that data have quality for CORE purposes if they conform to those best practices, not just to the normative assertions in Darwin Core. I lean a bit towards using the more general ISO 8601-1. The likely important difference for us would be the adoption of the EDTF extension in ISO 8601-2 in Darwin Core, rather than a change in Darwin Core from ISO 8601-1:2019 to another more recent ISO 8601-1 version. I'm not sure that differences in ISO 8601-1 versions are likely to affect CORE issues of data quality. |
Ok, I guess I was attributing @Tasilee's observation to you. So, is 24:00
acceptable or no? How would anyone know if there is no specific document to
point to?
…On Tue, Jun 13, 2023, 11:27 Paul J. Morris ***@***.***> wrote:
@tucotuco <https://github.com/tucotuco> may have been in an email thread,
good to memorialize here. Darwin Core eventDate and dateIdentified are both
specific in the non-normative comments/notes. A concern was that we would
link a normative assertion here to a non-normative assertion in Darwin
Core, and be forced to make a normative change if a non-normative change in
Darwin Core referenced a different specific ISO 9601-1 version, thus our
use of ISO 8601-1 rather than ISO 8601-1:2019 in the normative test
specifications may be more resilent to change than using the specific ISO
8601-1:2019 would be.
A counter argument, noted by @Tasilee <https://github.com/Tasilee> is
that there was a change between ISO 8601-1 version in the acceptability of
24:00 as a representation of midnight, with, if I recall correctly, ISO
8601-1:2019 specifying only 00:00 as midnight. I think I noted in an email
thread that even though this relates to (out of scope for CORE) time,
different java libraries handle DateMidnight differently, and accepting or
not accepting 24:00 as midnight could cause complications for implementors
(though for CORE, only on edge cases).
We also do have normative elements where we, by design, expect data values
to conform to best practice assertions that are made in non-normative
elements of Darwin Core, as those are places where we assert that data have
quality for CORE purposes if they conform to those best practices, not just
to the normative assertions in Darwin Core.
I lean a bit towards using the more general ISO 8601-1. The likely
important difference for us would be the adoption of the EDTF extension in
ISO 8601-2 in Darwin Core, rather than a change in Darwin Core from ISO
8601-1:2019 to another more recent ISO 8601-1 version. I'm not sure that
differences in ISO 8601-1 versions are likely to affect CORE issues of data
quality.
—
Reply to this email directly, view it on GitHub
<#76 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ723ICEI3OSFFJNJJ6QTXLB2GRANCNFSM4EKSOVRA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Tasilee @ArthurChapman I haven't found the original discussion on why to shift from the particular ISO 8601-1:2019 to the general ISO 8601-1 yet - can you point to it? @tucotuco an interpretation might be that 24:00 is consistent with ISO 8601-1, as it is consistent with some version of ISO 8601-1. But for CORE data quality purposes, does that matter? |
I understand that it makes things easier to maintain, and I won't belittle that! It seems "fuzzy". I understand all of the arguments and accept. |
Not sure where I made a note on this. So many discussions. By far the majority of changes to ISO 8601-1 over the years have had very little (or no) affect on the issues we are interest in. The only one that I have found recently is the case of midnight. Basically wrt the midnight issue, ISO 8601-1 prior to the 2019 version allowed midnight to represented as either 00:00 or 24:00. ISO 8601-1:2019 changed this such that midnight was only represented by 00:00. An amendment was made in 1922 (ISO 8601-1:2019/Amd 1:2022) which reverted to midnight to be represented by either 00:00 or 24:00. Note that in the past there have also been Corigenda that change the citation (e.g. ISO 8601-1:1988/COR 1:1991) So - if we cite the year - then we should use "ISO 8601-1:2019/Amd 1:2022". As @chicoreus mentions, every time there is a new version or a new amendment - even if it did not affect us - we would have to change the normative part of the test, and thus the implementations of the tests. I suggest we leave it at ISO 8601-1 and only change if there is a substantive change that affects our interpretation of date. We do, in this test at least, reference the full citation under the Source Authority and the References (both non-Normative) and I believe that this is the most efficient way of covering this issue. @Tasilee NOTE: we should change all the "ISO 8610-1:2019" references to "ISO 8601-1:2019/Amd 1:2022" in the Source Authorities and References. Reference: - see Wikipedia ISO 8601 (https://en.wikipedia.org/wiki/ISO_8601) |
Thumbs up if you agree to this change: Change Notes to: There may be valid identifications prior to Linnaeus but feel these are ok to flag anyway. If a parameter is not set, then the default is 1753-01-01. This test will, by design, flag as problematic cases (such as LTER plots and marine mammal sightings) where a known individual organism is identified by a specialist and then subsequently observed without new taxonomic identifications being made. Be aware that prior to 01-01-1919, if you are not certain of the use of the Gregorian calendar for the date, there can be variations as great as 1 year and 10 days between the Julian calendar and the Gregorian calendar. See the comparison on https://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar. You may need to take this into account when setting parameters for the earliestValidDate and the latestValidDate. |
The more I think about and investigate this, the more I think we have jumped off the boat for a swim. I don't think we are in any position to posit a date after which the Gregorian calendar assumption is safe. It is still not safe today, it's just that its use for civil purposes has ever fewer exceptions as time goes on (so far). Making a statement about a particular date (other than the date of its origin) for a date of special mention necessarily has discriminatory implications. We do NOT want that. In this particular issue, and perhaps in all others where this has come up, I do not see that the uncertainty associated with the date actually has anything to do with what we are testing. This test can't assess if a date is actually within a Gregorian date interval, except in special cases where the Julian and Gregorian calendars coincide, and even that is ignoring all other possible calendars. Instead, it is able to test that a date following the ISO 8601-1 date specification is within a range specified in that context. We can't effectively do anything else because Darwin Core doesn't even provide for stating the original calendar used - it's forcing people to use the Gregorian calendar without describing the responsibility for doing so and the consequences of not doing so. I think the place for awareness of the implications of dates with unknown calendars is in the Darwin Core date terms. |
I fully agree John. With calendars, we opened a can of worms that doesn't seem justified. Given propensity to err, would it be useful to include a (non-normative) statement like yours in the standards document? |
It couldn't hurt to say that the problem is recognized and not covered by
any of the tests.
…On Wed, Jun 14, 2023 at 6:19 PM Lee Belbin ***@***.***> wrote:
I fully agree John. With calendars, we opened a can of worms that doesn't
seem justified. Given propensity to err, would it be useful to include a
(non-normative) statement like yours in the standards document?
—
Reply to this email directly, view it on GitHub
<#76 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ724KYPISJJ5VKA3OEH3XLITEPANCNFSM4EKSOVRA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've added a placeholder in Section 2.2 for now, so we don't forget it. |
OK - As a summary - I think we are agree that the Note on this issue remain as is and we add discussion in the Standards document. |
@ArthurChapman I concur. Main place for discussion of calendar issues (and how we are or aren't tackling them, and potential implications for consumers of biodiversity data) is in a main standards document. Some tests do need mention, this one does not. I do suggest we alter the first sentence of the notes to align better with the second (rather than reading like a note to us) from: There may be valid identifications prior to Linnaeus but feel these are ok to flag anyway. If a parameter is not set, then the default is 1753-01-01. This test will, by design, flag as problematic cases (such as LTER plots and marine mammal sightings) where a known individual organism is identified by a specialist and then subsequently observed without new taxonomic identifications being made. To: There may be valid identifications prior to Linnaeus, but this test will flag these under the default value of bdq:earliestValidDate, as for most biodiversity data, pre-linnaean identification dates are likely to be errors. If a parameter is not set, then the default is 1753-01-01. This test will, by design, flag as problematic cases (such as LTER plots and marine mammal sightings) where a known individual organism is identified by a specialist and then subsequently observed without new taxonomic identifications being made. |
We've sidetracked in this issue (memorializing some more general discussions) off the proposal in #76 (comment) to change the specification to: INTERNAL_PREREQUISITES_NOT_MET if (1) dwc:dateIdentified is EMPTY or (2) dwc:dateIdentified contains an invalid value according to ISO 8601-1, or (3) bdq:includeEventDate=true and dwc:eventDate is not EMPTY and dwc:eventDate is not a valid ISO 8601-1 date; COMPLIANT if the value of dwc:dateIdentified is between bdq:earliestValidDate and bdq:latestValidDate inclusive and either (1) dwc:eventDate is EMPTY or bdq:includeEventDate=false or (2) if dwc:eventDate is a valid ISO 8601-1 date and dwc:dateIdentified overlaps or is later than the dwc:eventDate; otherwise NOT_COMPLIANT |
@chicoreus - I agree with your suggestion for the Note change. I can't see any problems with your logic on the Specification but it is very complicated to test what is a simple concept but can't see a way of simplifying it other than the comment below.. in (3) "bdq:includeEventDate=true and dwc:eventDate is not EMPTY and dwc:eventDate is not a valid ISO 8601-1 date" I think could be changed to "bdq:includeEventDate=true and dwc:eventDate is not a valid ISO 8601-1 date" If it is not empty - it either does or does not include a valid ISO date. If it has a valid ISO 8601-1 date it goes to (2) under COMPLIANT, so the "dwc:eventDate is not EMPTY" is redundant in (3). |
So, do we have a consensus on INTERNAL_PREREQUISITES_NOT_MET if (1) dwc:dateIdentified is EMPTY or (2) dwc:dateIdentified contains an invalid value according to ISO 8601-1, or (3) bdq:includeEventDate=true and dwc:eventDate is not a valid ISO 8601-1 date; COMPLIANT if the value of dwc:dateIdentified is between bdq:earliestValidDate and bdq:latestValidDate inclusive and either (1) dwc:eventDate is EMPTY or bdq:includeEventDate=false or (2) if dwc:eventDate is a valid ISO 8601-1 date and dwc:dateIdentified overlaps or is later than the dwc:eventDate; otherwise NOT_COMPLIANT |
I have updated the Expected Response in line with @Tasilee comment above, the Notes in line with @chicoreus comment above. Also updated the Specification Last Updated and removed NEEDS WORK |
…23-06-27) specifications, updates to unit test and to implementation. Updating metadata in other DwCOtherDateDQ tests.
Due to recent discussions, changed bdq:sourceAuthority is "ISO 8601-1:2019" [https://www.iso.org/obp/ui/] to bdq:sourceAuthority = "ISO 8601-1:2019" {[https://www.iso.org/iso-8601-date-and-time-format.html]} I don't see an overwhelming need to change the references in the Expected response to bdq:sourceAuthority. |
Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted". Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated" |
Changed reference in Expected Response from ISO 8601-1 to ISO 8601 and removed bdq:sourceAuthority entry (which is covered in References). |
The text was updated successfully, but these errors were encountered: