Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY #93

Open
Tracked by #24
iDigBioBot opened this issue Jan 5, 2018 · 53 comments
Open
Tracked by #24

TG2-AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY #93

iDigBioBot opened this issue Jan 5, 2018 · 53 comments
Labels
Amendment Completeness CORE TG2 CORE tests Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 TIME

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 3892f432-ddd0-4a0a-b713-f2e2ecbd879d
Label AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY
Description Proposes an amendment to the value of dwc:eventDate from values in dwc:year, dwc:month and dwc:day.
TestType Amendment
Darwin Core Class dwc:Event
Information Elements ActedUpon dwc:eventDate
Information Elements Consulted dwc:year
dwc:month
dwc:day
Expected Response INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED
Data Quality Dimension Completeness
Term-Actions EVENTDATE_FROM_YEARMONTHDAY
Parameter(s)
Source Authority
Specification Last Updated 2024-09-15
Examples [dwc:eventDate="", dwc:year="1420", dwc:month="10", dwc:day="29": Response.status=FILLED_IN, Response.result=dwc:eventDate="1420-10-29", Response.comment="dwc:year, dwc:month and dwc:day are interpretable, even if pre-Linnaeus"]
[dwc:eventDate="", dwc:year="2024", dwc:month="2", dwc:day="30": Response.status=NOT_AMENDED, Response.result=, Response.comment="Not a valid date"]
Source TG2-Gainesville
References
Example Implementations (Mechanisms) Kurator:event_date_qc
Link to Specification Source Code https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/main/java/org/filteredpush/qc/date/DwCEventDQ.java#L1003 unit tests at https://github.com/FilteredPush/event_date_qc/blob/5f2e7b30f8a8076977b2a609e0318068db80599a/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L493
Notes An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. If dwc:year and dwc:day are present and interpretable, but dwc:month is not supplied or is not interpretable, then just the year should be given as the proposed amendment. This test assumes that that dwc:year, dwc:month, dwc:day are in a Gregorian calendar, and that only those three pieces of information are needed to produce a dwc:eventDate (explicitly in ISO 8601-1 format, and thus using the Gregorian calendar). When running the test, the original precision, e.g. dwc:year=1980, dwc:month=1 should be retained, e.g. dwc:eventDate should become 1980-01, not 1980-01-01/1980-01-3.
@iDigBioBot
Copy link
Collaborator Author

Comment by Lee Belbin (@Tasilee) migrated from spreadsheet:
Added post scoring for completeness

@iDigBioBot
Copy link
Collaborator Author

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet:
Cf #89B . Added following discussion with @jw

@chicoreus
Copy link
Collaborator

Noted in call 2022 Feb 13: Need to be explicit about cases where year is provided by month/date contain values not interpretable to a day or a month.

@chicoreus
Copy link
Collaborator

Example cases to be explicit about in the test data:

dwc:year="2021", dwc:month="", dwc:day="29"

dwc:year="2021", dwc:month="X", dwc:day="29"

Specification so as both of these result in dwc:eventDate="2021" (or 2021-01-29/2021-12-29).

@chicoreus
Copy link
Collaborator

Per discussion on TG2 call 2022 Mar 6, added word unambiguous to expected response, such that X is not interpreted as 10, as it could be [missing data], and per suggestion by @tucotuco added guidance to use only year if just day and year are present.

chicoreus added a commit to FilteredPush/event_date_qc that referenced this issue Mar 9, 2022
… specifications. DESCRIPTION: Updating implementation of AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY to fit current (2022-03-08) specification, making method name consistent, and deprecating old method. Adding a utility method romanMonthToInteger to interpret roman numeral month values as numbers. Adding/updating relevant unit tests.
chicoreus added a commit to FilteredPush/event_date_qc that referenced this issue Mar 9, 2022
… specifications. DESCRIPTION: Updating implementation of AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY to fit current (2022-03-08) specification, fixing unit tests to conform with current specifications.
@chicoreus
Copy link
Collaborator

To handle the issues we've been having fitting an implementation from the specification to the test data, I suggest adding aome clauses about interpretability to the notes.

From:

An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. If dwc:year and dwc:day are present, but dwc:month is not supplied, then just the year should be given as the proposed amendment.

To:

An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. If dwc:year and dwc:day are present and interpretable, but dwc:month is not supplied or is not interpretable, then just the year should be given as the proposed amendment.

chicoreus added a commit to FilteredPush/event_date_qc that referenced this issue Mar 21, 2022
… specifications. DESCRIPTION: Conforming the implementation and unit tests for AMENDMENT_EVENTDATE_FROM_YEARMONTHDAY to the expectations expressed in the validation data for handling uncertainty. This code passes validation dataID 994 (and all the other TIME test cases in v19 of the validation data except for dataID 987 which probably has a typo in the validation data).
@Tasilee
Copy link
Collaborator

Tasilee commented Mar 21, 2022

Notes amended accordingly.

@Tasilee Tasilee removed the NEEDS WORK label Apr 3, 2022
@Tasilee
Copy link
Collaborator

Tasilee commented Apr 18, 2022

Changed "AMENDED" to "FILLED_IN" in accordance with discussions April 16.

chicoreus added a commit to FilteredPush/event_date_qc that referenced this issue Aug 30, 2022
…ENDED when proposing changes to empty terms. Updated method, tests, and comments.
@Tasilee
Copy link
Collaborator

Tasilee commented Aug 31, 2022

Based on @chicoreus email August 31, then

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is uninterpretable as a valid year; FILLED_IN the value of dwc:eventDate if an unambiguous ISO 8601-1:2019 date can be interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED |

becomes

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is uninterpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1:2019 date can be interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED |

?

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 4, 2022

Expected Response updated as per above with minor edit-

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1:2019 date can be interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

@Tasilee
Copy link
Collaborator

Tasilee commented Dec 11, 2022

More discussion today on this suggests we can interpret dwc:month from Roman numerals and @ArthurChapman said that using Roman numerals for month is not unusual, so we have amended the examples to illustrate this principle. The test data records have been changed accordingly and we will add a Vocabulary item for "Roman numerals".

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Sep 12, 2024

This test currently (from the Notes) has dependencies - i.e. one should use dwc:verbatimEventDate, as a priority, or dwc:startDayOfYear and dwc:endDayOfYear, before attempting to run this test. We have wanted all the tests to be stand-alone and thus, I think the Specification should be rewritten as follows

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate, dwc:verbatimEventDate, dwc:startDayOfYear and dwc:endDayOfYear, are bdq:NotEMPTY or dwc:year is bdq:EMPTY or is not interpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED.

Also #132 should also be rewritten similarly (q.v.)

My other question is what is meant as a "valid year" - would this exclude "2035" for example? - and what about "1034" that may be an error for "2034" That would mean that we should then run #84 after this test.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 12, 2024 via email

@ArthurChapman
Copy link
Collaborator

@chicoreus - this reduces the dependency by making it explicit. Currently, in the Notes we say "An attempt to populate dwc:eventDate from dwc:verbatimEventDate and from dwc:startDayOfYear and dwc:endDayOfYear should be made before this test is run. " By putting it in the INTERNAL_PREREQUISITES_NOT_MET - you don't run it if there is something in those fields.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 12, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 13, 2024

Strange as it seems, I fully agree with @chicoreus in relation to the tests to be presented as independent, other than the Validate-Amend-(Re)Validate process. This is what I said to @ArthurChapman on this issue. As soon as we prescribe dependencies, we open a Pandora's box, and as @chicoreus pointed out, possible workflows are many (exemplified by a domain using a subset of tests).

Thanks also @chicoreus for clarifying what a 'interpretable year' is. This is what I was looking for in relation to edge cases for the Test Data. Note that the previous Expected Response said

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as a valid year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED.

Do we mean 'interpretable as a valid year' or 'interpretable as a valid ISO 8601 year' as they are different. I presume the former?Note that 'valid' would supersede 'interpretable' in that if it is a 'valid year' sensu ISO 8601, it will be interpretable?

Scenario 1: dwc:year="1505" or "2035" are interpretable (as integers), and both are valid ISO 8601 years (0-9999 are ok), so the response.status given the Expected Response above would be NOT_AMENDED. Right?

Scenario 2: dwc:year="-100" or "10001" may be a valid year (as it is an integer) but it is not a valid ISO 8601 year (0-9999 are, at least without "prior agreement").

Scenario 3: dwc:year="CX" is not an integer, the response.status should be 'INTERNAL_PREREQUISITES_NOT_MET?

As @chicoreus wants to 'short circuit' an evaluation of a Date (dwc:eventDate) if dwc:year is in some way invalid (based upon the priority of dwc:year one presumes), should we use

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

or do we intend to limit dwc:year to a greater extent as in

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not a valid ISO 8601 year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

?

@ArthurChapman
Copy link
Collaborator

@Tasilee wrote
Scenario 1: dwc:year="1505" or "2035" are interpretable (as integers), and both are valid ISO 8601 years (0-9999 are ok), so the response.status given the Expected Response above would be NOT_AMENDED. Right?

Am I missing something - wouldn't they both be FILLED_IN?

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 13, 2024

@ArthurChapman - Yes, my bad: FILLED_IN would be correct if 'valid' refers to an integer.

@chicoreus
Copy link
Collaborator

@Tasilee I'd be happy with either (taking your two above, and adding "interpretable as" (we can't assert that something that is likely to be serialized as a string is a different data type, only that it can be interpreted as such:

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

or

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as a valid ISO 8601 year; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

Note that specifying 1806-1 forecloses the use of the Library of Congress extension for uncertainty in dates (that is one of the things that needs to get pushed as a change in Darwin Core but it is currently specified in a comment about best practice "Recommended best practice is to use a date that conforms to ISO 8601-1:2019."), we'd be more change resistant to specify an ISO 8601 date in a normative assertion.

I'd prefer the first option, this corresponds to the current event_date_qc implementation, and is simpler for implementors (just use a native type conversion function to see if the presented dwc:year in a probably string representation can be converted to a native integer data type, rather than doing that and then evaluating if the integer conforms to the ISO range of integers for year).

Thus:

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

@ArthurChapman
Copy link
Collaborator

Some time back, I argued that we should just use ISO 8601 in a lot of our tests in the light of ISO 8601-1:2019 and the possibility of other extensions.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 14, 2024

This has been a useful discussion in raising the need to be specific and concise in our test Specifications/Expected Responses. Changing the Expected Response from

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is bdq:NotEmpty or dwc:year is bdq:Empty; FILLED_IN the value of dwc:eventDate if an ISO 8601-1 date is interpretable from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

to

INTERNAL _PREREQUISITES_NOT_MET if dwc:eventDate is not EMPTY or dwc:year is EMPTY or is not interpretable as an integer; FILLED_IN the value of dwc:eventDate if an ISO 8601 date was interpreted from the values in dwc:year, dwc:month and dwc:day; otherwise NOT_AMENDED

@ArthurChapman: ISO 8601 is the bdq:sourceAuthority for only two tests: #26 and #76 and it is odd that it is not listed as such here or in the tests

#36
#52
#61
#66
#69
#86
#92 (Supplementary)
#125
#140
#272 (Supplementary)
#273 (Supplementary)
#274 (DO NOT IMPLEMENT)

Is there any logic to this? Did we agree that we would universally use "bdq:sourceAuthority" in the Expected Responses/Specifications (as in #26 and #76)?

@ArthurChapman
Copy link
Collaborator

Looking back on my notes - especially under #26, I think we recommended ISO 8601-1 which covers the Basic Rules. The latest version is 2019, and this was amended in 2022 for some minor technical corrections. I think that if we use ISO 8601-1 we cover the latest versions with restricting ourselves to the 2019 version if there are later updates. ISO 8601-2 cover extensions that don't apply to our data. If we just use ISO 8601 we bring in all earlier versions which don't cover everything we need.

So my conclusion is that we use ISO 8601-1 for all tests.

@ArthurChapman
Copy link
Collaborator

Given my last post - the most accurate representation is ISO 8601-1:2019(en). I would add that into the References, but in the Expected Response leave as ISO 8601-1 That allows updates of the test in the future without making changes to the Expected Response/Specification. i.e. as above. I would follow #26 and include 8601-1 in the Expected Response as easrlier versions of 8601 didn't include some of the criteria we use.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 15, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 15, 2024

....so

  1. What do we use then for where we reference ISO 8601, and
  2. Do we explicitly use the phrase "ISO 8601 ..." in the Specifications/Expected Response or "bdq:sourceAuthority" as in TG2-AMENDMENT_DATEIDENTIFIED_STANDARDIZED #26 and TG2-VALIDATION_DATEIDENTIFIED_INRANGE #76?

@ArthurChapman
Copy link
Collaborator

Do we thus need to put both ISO 8601-1:2019 and ISO 8601-2:2019 in some of the Specifications?

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 15, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 15, 2024

I'm still awaiting a decision about my two issues above. On my first point, I'd be happy with "ISO 8601". Second point, we either need to 'hard wire' "ISO 8601" into the Specifications or use "bdq:sourceAuthority". I can see either being ok, so I just seek consistency. This is also something (with Parameter) that I'd be happy to add to the Supplement document.

@tucotuco
Copy link
Member

I don't think we can test for quality without simply supporting ISO8601-1. I don't think it should be a source authority, as it does not provide anyone with a lookup. Any implementation that followed any other standard (is there one?) would be completely different to the point where those tests would have to be distinct tests.

Alas we will be forever saddled with data that say 1820 and actually meant sometime during 1820, but at least it will support those who can explicitly say it correctly with 1820-??-??.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 16, 2024 via email

@tucotuco
Copy link
Member

Sorry, yes, I meant ISO8601 with no dashes. Just so used to the old one it comes out without thinking.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 16, 2024

OK, thanks @tucotuco and @chicoreus. So, I will edit #26 and #76 to replace "bdq:sourceAuthority" with "ISO 8601" and ensure all the other tests noted above follow that template.

@tucotuco
Copy link
Member

tucotuco commented Sep 16, 2024

My question remains whether there should be a bdq:sourceAuthority at all.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 16, 2024

I can't remember or original reasoning for bdq:sourceAuthority. (seemed a good idea at the time?).

Removing lookups makes Specifications easier to understand but it would cascade the work. I presume it would be good to replace the current Reference to the Source Authority with the two-part template we currently use under Source Authority, and remove all references to bdq:sourceAuthority...?

@tucotuco
Copy link
Member

That makes sense to me, but I have enough hesitation because of not remembering why there is a bdq:sourceAuthority for these to get consensus.

@ArthurChapman
Copy link
Collaborator

From my memories of discussions - there are several cases where I don't think we need SourceAuthorities - they are where we don't have a Parameterized Test and where we link to an ISO Standard (or equivalent). This would apply to Dates where we reference ISO 8601 and tests like #272, #273, #274 (but that is parameterized with another Source Authority). Others reference the Country Code ISO 3166-1-alpha-2, but Source Authority may need to be kept - see Comments under #20 where there is mention of alternative references, thus there may be benefit it retaining the Source Authority for these tests. Note #62 and #48 which refer to ISO 3166-1-alpha-2 in the Source Authority, but not in the Expected Response.

Other possible ones that could be considered are #38 (but that is complicated and probably needs retaining), and #133 (why is this one not as complicated as #38?)

My view would be to do what you have in the TIME tests, but I don't see a strong case to alter those others.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 16, 2024 via email

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 16, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2024

As suggested in yesterday's email, the consistency (and utility) of the use of bdq:sourceAuthority is now rational and consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Amendment Completeness CORE TG2 CORE tests Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 TIME
Projects
None yet
Development

No branches or pull requests

5 participants