Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_COUNTRYCODE_NOTEMPTY #98

Open
iDigBioBot opened this issue Jan 5, 2018 · 29 comments
Open

TG2-VALIDATION_COUNTRYCODE_NOTEMPTY #98

iDigBioBot opened this issue Jan 5, 2018 · 29 comments
Labels
CODED Completeness CORE TG2 CORE tests SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 853b79a2-b314-44a2-ae46-34a1e7ed85e4
Label VALIDATION_COUNTRYCODE_NOTEMPTY
Description Is there a value in dwc:countryCode?
TestType Validation
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:countryCode
Information Elements Consulted
Expected Response COMPLIANT if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT
Data Quality Dimension Completeness
Term-Actions COUNTRYCODE_NOTEMPTY
Parameter(s)
Source Authority
Specification Last Updated 2024-11-10
Examples [dwc:countryCode="Australia": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:countryCode is bdq:NotEmpty"]
[dwc:countryCode="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:countryCode is bdq:Empty"]
Source
References
Example Implementations (Mechanisms) FilteredPush:geo_ref_qc
Link to Specification Source Code geo_ref_qc DwCGeoRefDQ,validationCountrycodeNotempty()
Notes This test will return 'NOT_COMPLIANT' for records on the "High seas" where dwc:countryCode is bdq:Empty. We recommend that data from the high seas (outside national jurisdictions) use dwc:countryCode = "XZ" and dwc:country = "High seas" until an agreement has been made.
@iDigBioBot
Copy link
Collaborator Author

Comment by Lee Belbin (@Tasilee) migrated from spreadsheet:
Added post scoring for consistency

@cgendreau
Copy link
Contributor

This test should probably use the same word "EMPTY" as #20 instead of NULL.

@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Jan 17, 2018
@ArthurChapman ArthurChapman changed the title TG2-VALIDATION_COUNTRYCODE_NULL TG2-VALIDATION_COUNTRYCODE_EMPTY Jan 29, 2018
@Tasilee Tasilee added Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. and removed Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT labels Mar 21, 2018
@Tasilee Tasilee removed NEEDS WORK Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Aug 26, 2018
@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Aug 27, 2018
@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Apr 1, 2020

Need to add somewhere (Expected Response) a reference to ISO 3166. I have added a reference in the References.

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 1, 2020

Edited your comment (odd that you can) to 3166.

@ArthurChapman
Copy link
Collaborator

Thanks @Tasilee - was just about to make that correction.

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 8, 2020

Looking at this one again, we aren't checking for a valid dwc:countryCode, only that it is not EMPTY. A reference to ISO 3166 is fine, but isn't needed in Expected response.

@tucotuco
Copy link
Member

tucotuco commented Apr 8, 2020

Agreed.

ArthurChapman added a commit that referenced this issue Oct 8, 2020
In accord with #189 added test data file for #98
@Tasilee Tasilee changed the title TG2-VALIDATION_COUNTRYCODE_EMPTY TG2-VALIDATION_COUNTRYCODE_NOTEMPTY Mar 22, 2022
@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
@chicoreus
Copy link
Collaborator

By including this test in CORE we are asserting that any data from the high seas is not fit for any of the use cases that include this test.

We need some specific recommendation for handling data from the High Seas. Country code, using the ISO list, should be empty for data from the high seas. This test needs some way to accomodate that to allow for data from the high seas being fit for use.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 13, 2024

I agree @chicoreus: CORE suggests a universal use case. Does this raise the need for two other use cases - terrestrial and marine ecology?

We could set this test as Supplementary for terrestrial domain, and optionally generate an equivalent for the marine domain (using dwc:waterBody?).

As you suggest, we may be able to accommodate by trying to detect marine domains (dwc:waterBody, dwc:decimalLatitude and dwc:decimalLongitude, dwc:minimumDepthInMeters and dwc:maximumDepthInMeters....or ?). The simplest ER would be something like

"COMPLIANT if dwc:countryCode is NOT_EMPTY, or if any of wc:waterBody, dwc:minimumDepthInMeters or dwc:maximumDepthInMeters are NOT_EMPTY..."?

@ArthurChapman
Copy link
Collaborator

dwc:waterBody includes rivers and lakes etc. which are inside countries. I think we also decided sometime earlier, that waterbody in TGN was unworkable.

@chicoreus
Copy link
Collaborator

chicoreus commented Aug 13, 2024 via email

@tucotuco
Copy link
Member

The UN/LOCODE system uses "XZ" to represent international waters or high seas. This is not an official ISO country code but is commonly used in logistics and transportation systems. I think this could be a good solution for covering fitness for use of data from the high seas.

@tucotuco
Copy link
Member

Also, ZZ is an often used user-defined ISO code taken to mean "unknown". This would apply to situations where the location is unknown (i.e., not found or explicitly stated as unknown) as well as situations where the location is known, but can not be assigned to a single country code (e.g., "Argentina/Uruguay").

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 25, 2024

If we had dwc:decimalLatitude and dwc:decimal longitude, we may be able to use the shapefile download of country+EEZ at https://www.marineregions.org/downloads.php. We could set INTERNAL_PREREQUISITES_NOT_MET if we didn't have latitude and longitude. Just a long shot. If we can't do something like this, then I guess it is Immature/Incomplete until dwc:countryCode value of "XZ" becomes widely used?

@ArthurChapman
Copy link
Collaborator

I don't see a problem - we are not checking against Country Codes with this test - just checking if it has something in the field or not. Where we look at Standard etc. we could check against "country codes + XZ" and add a note about XZ

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 25, 2024

As this test now stands, I agree with @chicoreus in that we will be wrongfully returning NOT_COMPLIANT for any 'high seas' records. As this area is more than half of the planet, we need to take it seriously.

We therefore have three options

  1. Be aspirational in the use of dwc:countryCode="XZ", knowing that we will have many NOT_COMPLIANTs
  2. Use coordinates if available to test for "high seas". I am unaware of an API for this, but there are shapefiles for 'high seas' as mentioned. With this option, we must remain true to our 'easy to implement' criterion, to which I defer to @chicoreus and @tucotuco.
  3. Set the test to Immature/Incomplete, and promote "XZ".

I am slightly inclined to (3).

@ArthurChapman
Copy link
Collaborator

I disagree - this test - like all other tests for NOTEMPTY - is only checking if there is a value in that field - it makes no assumption on why it is empty. It is a simple YES/NO test.

@ArthurChapman
Copy link
Collaborator

@Tasilee - I think what you are saying applies to Tests #73 and #62- Not this test. In those tests I think they could be worded (especially #73) to include "or XZ ..." I'd have to look more closely to those two tests and possibly comment there rather than here.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 25, 2024 via email

@ArthurChapman
Copy link
Collaborator

I still don't see a problem as it still a valuable test. Many datasets would not hold both terrestrial and marine data, and we have a separate test for terrestrial/marine (that would include high seas). There are many tests that one could argue won't add quality in every case - I think we discussed that with at least one other test. But in many datasets it would add quality knowing this. In the NOTEMPTY tests we are testing one simple thing. We then have other tests that test for other things, and we could, as @tucotuco suggested under #73 (#73 (comment)), develop further tests for High Seas - my view is yes - we could do that - but lets leave that for after the Standard is published. Let's not continue adding and deleting tests at this stage.

@tucotuco
Copy link
Member

I still don't see a problem as it still a valuable test. Many datasets would not hold both terrestrial and marine data, and we have a separate test for terrestrial/marine (that would include high seas). There are many tests that one could argue won't add quality in every case - I think we discussed that with at least one other test. But in many datasets it would add quality knowing this. In the NOTEMPTY tests we are testing one simple thing. We then have other tests that test for other things, and we could, as @tucotuco suggested under #73 (#73 (comment)), develop further tests for High Seas - my view is yes - we could do that - but lets leave that for after the Standard is published. Let's not continue adding and deleting tests at this stage.

That is an easy posture to get behind at this point!

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 26, 2024 via email

@ArthurChapman
Copy link
Collaborator

By saying that the COUNTRYCODE is EMPTY does not say that the data is Not fit for use. It depends on the use and the user has to make that decision. Anyone working in the marine area knows that marine data would not have a Country Code. There are so many other tests that test for NOTEMPTY - by saying they are EMPTY does not make then not fit for use. KINGDOM_NOTEMPTY, GEODETICDATUM_NOTEMPTY, EVENTDATE_NOTEMPTY. There are many other tests that return NOT_COMPLIANT that don't make the data NOT FIT FOR USE for many uses

Don't read too much into what each of the tests are doing and not doing. The EMPTY/NOTEMPTY tests are just that! There is, or there is not something in the field. Other tests then do the next stages. Because we don't have a workflow and the tests are stand alone, means that in many cases that test alone won't tell you if the data is fit for your use. If we had a workflow order, you may do MARINETERRESTRIAL test first and then only run this test on Terrestrial data, but we don't do that.

I don't see that there is anything to resolve. If we make a change here, then we have to revisit nearly every other test, because similar arguments could be made for many of the tests.

@ArthurChapman
Copy link
Collaborator

@Tasilee wrote: "There is a fundamental problem that we have to solve here. Otherwise
the test suite is not itself usable. There are multiple possible
solutions. The simplest is to assert that high seas data should use XZ
for the country code."

Put in the notes that "This test will return 'NOT_COMPLIANT' for records in the "High Seas". We recommend that high seas data use the dwc:countryCode = XZ". I would strongly oppose moving this and similar tests out of CORE.

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 26, 2024 via email

@chicoreus
Copy link
Collaborator

chicoreus commented Sep 26, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 27, 2024

I made a change to the Expected Response from

COMPLIANT if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT

to

COMPLIANT if dwc:countryCode is bdq:NotEmpty or has a value of "XZ"; otherwise NOT_COMPLIANT

and updated the Notes to

This test will return 'NOT_COMPLIANT' for records on the "High seas" where dwc:countryCode is bdq:Empty. We recommend that data from the high seas (outside national jurisdictions) use dwc:countryCode = "XZ" and dwc:country = "High seas" until an agreement has been made.

@chicoreus
Copy link
Collaborator

Changing the expected response back to:

COMPLIANT if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT

dwc:countryCode = XZ is bdq:notEmpty, so there is no reason for the specification to assert "COMPLIANT if dwc:countryCode is bdq:NotEmpty or if dwc:countryCode is bdq:NotEmpty; otherwise NOT_COMPLIANT"

Callout of XZ in the notes is good, but the statement in the expected response is redundant and confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CODED Completeness CORE TG2 CORE tests SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Projects
None yet
Development

No branches or pull requests

6 participants