Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_TAXONID_COMPLETE #121

Closed
ArthurChapman opened this issue Jan 17, 2018 · 36 comments
Closed

TG2-VALIDATION_TAXONID_COMPLETE #121

ArthurChapman opened this issue Jan 17, 2018 · 36 comments
Labels
Conformance NAME Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation

Comments

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Jan 17, 2018

TestField Value
GUID a82c7e3a-3a50-4438-906c-6d0fefa9e984
Label VALIDATION_TAXONID_COMPLETE
Description Does the value of dwc:taxonID contain a complete identifier?
TestType Validation
Darwin Core Class Taxon
Information Elements ActedUpon dwc:taxonID
Information Elements Consulted
Expected Response INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is bdq:Empty; COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is in the form scope:value, or (4) taxonID is a validly formed URI with host and path where path consists of more than just "/"; otherwise NOT_COMPLIANT
Data Quality Dimension Completeness
Term-Actions TAXONID_COMPLETE
Parameter(s)
Source Authority
Specification Last Updated 2023-09-18
Examples [dwc:taxonID="urn:lsid:zoobank.org:act:17ADF24F-027F-44F6-9543-D3D0260CE79E": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:taxonID contains a URI and a namespace indicator"]
[dwc:taxonID="Hakea decurrens ssp. physocarpa": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:taxonID does not contain a URI"]
Source TG2-Gainesville
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L1882
Notes We have made this test SUPPLEMENTARY - which means that while we do not believe this test should be CORE, there may be circumstances where some communities may want it as CORE. This change derived from a discussion at TDWG 2023 on the use of dwc:taxonID and dwc:scientificNameID. It was concluded that a test VALIDATION_SCIENTIFICNAMEID was justified as CORE while this test should be SUPPLEMENTARY. The original test "VALIDATION_TAXONID_AMBIGUOUS" was seen by the TG2 team as too complex to implement. If we use any single bdq:sourceAuthority such as GBIF, a valid and complete dwc:taxonID based on an alternative source authority is unlikely to provide a valid match. A text or number string as a namespace indicator without a URI will be ambiguous. As an example, GBIF's backbone taxonomy dataset can be found at https://doi.org/10.15468/39omei. When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:taxonID.
@ArthurChapman ArthurChapman added TG2 Validation NAME Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT labels Jan 17, 2018
@tucotuco
Copy link
Member

tucotuco commented Aug 26, 2018

Agreed at TDWG 2018 DQIG meeting that any mention of uniqueness is redundant with the resolvability requirement, hence references to uniqueness were dropped.

@ArthurChapman
Copy link
Collaborator Author

We currently have in the notes: "Note that the cause of failure may be due to a service failure. Implementations of this test should account for this type of failure and not necessarily report a failure."

Should this then be covered by adding an EXTERNAL_PREREQUISITES_NOT_MET?

@Tasilee
Copy link
Collaborator

Tasilee commented Jan 22, 2019

This would apply to any external lookup. One presumes any system failure would generate a specific response like "FAILED_LOOKUP"?

@chicoreus
Copy link
Collaborator

@ArthurChapman yes, EXTERNAL_PREREQUISITES_NOT_MET would cover reporting some sort of transient system failure where asking the same question later might get an answer. @Tasilee Failed_Lookup has ambiguity to it - it carries the potential implication that a lookup was run (and failed), and that something was looked up. EXTERNAL_PREREQUISITES_NOT_MET covers the more general case of some external resource (lookup, calculation, or otherwise) was not available, try again later.

@ArthurChapman
Copy link
Collaborator Author

ArthurChapman commented Jan 23, 2019

@Tasilee and I have a problem with this one. How do we resolve the TaxonID. The examples given in Darwin Core include a GUID and just a number ("32567") which is similar to our example of a failure. How is it possible for us to Validate - unless it references an authority - which according to Darwin Core is not the case. I don't see how this can work. @tucotuco, @chicoreus is this possible to do? Is it a valuable test?

@tucotuco
Copy link
Member

The best practices for identifiers says they should be globally unique for the instance of the Class they represent, persistent, and resolvable. That is an applicability statement apart from Darwin Core. In Darwin Core, or in a Darwin Core Archive, there are no such restrictions. This shouldn't be too disturbing, as Darwin Core does not implement restrictions in and of itself, it merely provides definitions and other guiding information. So, the problem, if it were one, would not be unique to the dwc:taxonID term. What does seem to be a problem is that, if the taxonID does not contain the information to resolve it (the authority), that is an internal prerequisite that isn't met - there is a problem with the data rather than a problem with a service. That is not captured in the Expected Response.

@Tasilee
Copy link
Collaborator

Tasilee commented Jan 24, 2019

Are we saying that the Expected response should be "EXTERNAL_PREREQUISITES_NOT_MET if resolving service was unavailable; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or is not resolvable; COMPLIANT if the value of the field dwc:taxonID is resolvable; otherwise NOT_COMPLIANT" given @chicoreus comment on EXTERNAL and @tucotuco on INTERNAL?

@ArthurChapman
Copy link
Collaborator Author

I wouldn't think so - as if it is non-resolvable it is NOT_COMPLIANT. What John is saying is that it requires somewhere in the record a reference to what the resolving authority is. I think we are saying
"EXTERNAL_PREREQUISITES_NOT_MET if resolving service was unavailable; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or the resolving authority is not determined; COMPLIANT if the value of the field dwc:taxonID is resolvable; otherwise NOT_COMPLIANT"
or some similar word to "not determined" (not identifiable, not known, not referenced within the record)

@tucotuco
Copy link
Member

I agree with @ArthurChapman that the response should be NOT_COMPLIANT if the taxonId is not resolvable, but I would not expect the authority information to be anywhere else in the record than in taxonId. It would be resolvable if it was possible to directly (full URI) or indirectly (unambiguous namespace from which full URI could be constructed) resolve the taxonId.

@Tasilee
Copy link
Collaborator

Tasilee commented Jan 29, 2019

OK, so is the Expected Response now ok?

@ArthurChapman
Copy link
Collaborator Author

I think it is OK - but may be better (given what @tucotuco said above) if we said "INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or the resolving authority is not referenced within the record" What do you think @tucotuco ?

@tucotuco
Copy link
Member

I would be specific, INTERNAL_PREREQUISITE_NOT_MET if the field dwc:taxonID is either not present or is EMPTY or does not include the resolving authority.

@Tasilee
Copy link
Collaborator

Tasilee commented Jan 30, 2019

Thanks @ArthurChapman and @tucotuco - done.

@chicoreus
Copy link
Collaborator

@tucotuco how about a taxon in the form urn:uuid:e34fda24-f53e-4627-b591-b6c6ca349293 that should be an unambiguous unique taxonID, with a known urn scheme, just not resolvable. Or, e34fda24-f53e-4627-b591-b6c6ca349293? I'd tend to think that this test is for uniqueness, not necessarily resolvability. Would the requirement be any urn:uuid, urn:catalog, lsid:, http:, https: identifier?

@tucotuco
Copy link
Member

tucotuco commented Feb 4, 2019

@chicoreus That may be a GUID. It is in the form of a GUID. But no one can resolve it to know for sure. If it resolves in addition, you can be sure it is a GUID. But these are just my perspective. Darwin Core doesn't require anything in particular, so it comes down to what we want the test to do everywhere.

ArthurChapman added a commit that referenced this issue Oct 6, 2020
In accord with #189 added test data file for TAXON_AMBIGUOUS #121
@ArthurChapman
Copy link
Collaborator Author

Not sure of the wording here.

"... INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY or does not include the resolving authority ..."

The example has just "dwc:taxonID=54367" i.e. is just a number but does NOT include a resolving authority - so as written would be (at least to me) - INTERNAL_PREREQUISITES_NOT_MET

Also none of the examples in the test dataset include "the resolving authority"

With all these we need to either 1) delete the worlds "or does not include the resolving authority" or 2) modify all our examples

@chicoreus
Copy link
Collaborator

@ArthurChapman I'd agree. I'd concur with deleting the phrase "or does not include the resolving authority" from the specification. But, there is likely more work required.

urn:lsid:marinespecies.org:taxname:406150 is a likely, unique, valid, non ambiguous value for taxonID.

Given the specification of "GBIF backbone taxonomy service", there isn't actually a way of querying that service for a taxonID, e.g. https://api.gbif.org/v1/species/search?taxonID=urn:lsid:marinespecies.org:taxname:406150&datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c ignores the invalid term taxonID= and just returns everything in the backbone taxonomy.

Taxon records in the backbone taxonomy do include taxonID, so an implementation which works off of a download from GBIF would work, but I'd be hard pressed to implement this as defined, unless we assert that the only non-ambiguous taxonID values are identifiers of records in GBIF's backbone taxonomy, thus, https://api.gbif.org/v1/species/2435099 and 2435099 would both be compliant, but the quite unambigous urn:lsid:marinespecies.org:taxname:406150 as we can't find it through the GBIF service, would be ambiguous.

Noting that https://api.gbif.org/v1/species/54367 currently does not return any results, suggesting that either GBIF deleted the record, or 54367 is ambiguous as we don't know which dataset it belongs to....

urn:lsid:marinespecies.org:taxname:406150
Is unambigous.
Is NOT resolvable (thus fails on that part of the specification).
Is NOT findable through the GBIF backbone taxonomy service (thus fails on that part of the specification).
Includes an authority, but not a resolving authority (thus fails on that part of the specification).

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 7, 2022

Discussion of the TG2 team 7th March 2022 suggested that this test was too complex to implement with due utility. Consequently, it was suggested that we rename it as an 'INCOMPLETE' type test of dwc:taxnID with compliance only if both a URI and suffix (? a better term?) were present.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 7, 2022

@tucotuco 's "namespace indicator" to replace suffix" seems good to me.

@Tasilee Tasilee changed the title TG2-VALIDATION_TAXONID_INCOMPLETE TG2-VALIDATION_TAXONID_COMPLETE Mar 22, 2022
@Tasilee
Copy link
Collaborator

Tasilee commented Apr 3, 2022

Are we all happy with this test as it stands now?

@Tasilee Tasilee removed the NEEDS WORK label Apr 3, 2022
@chicoreus
Copy link
Collaborator

On trying to implement this, finding the specification wanting.

Currently: "Description: Does the value of dwc:taxonID contain both a URI and namespace indicator?"
Currently: "Expected Response: INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY; COMPLIANT if dwc:taxonID contains both a URI and a namespace indicator; otherwise NOT_COMPLIANT"

Propose the following specification:
COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is a validly formed URI with host and path where path consists of more than just "/", and if host is www.gbif.org and the path begins with "/species/", the path contains additional trailing characters; otherwise NOT_COMPLIANT

Here the semantics of LSID are valuable, for to be validly formed, a LSID must specify the authority, namespace, and objectID - which is really what we want to know in this test, can we tell what the taxonID reference is and what it is referring to, while for http:/https URIs, the path can contain the equivalent of the lsid namespace and the lsid objectID, as in https://www.gbif.org/species/2529789, where https://www.gbif.org/species/ is a validly formed URI that needs special case handling to tell that it doesn't actually contains a reference to a particular taxon. The specification could include additional common special cases (e.g. URIs with a path containing aphia.php and query containing id=), or not.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jun 8, 2022
…cifications. DESCRIPTION: Updating tdwg/bdq#120 VALIDATION_TAXONID_NOTEMPTY to current specification.  Adding an implementation of tdwg/bdq#121 VALIDATION_TAXONID_COMPLETE with notes about needing to update the specification.  Adding supporting RFC8141URN and LSID classes to help in identifying and parsing URNs and LSIDs to support tdwg/bdq#121.  Initial work in progress on implementation of AMENDMENT_SCIENTIFICNAME_FROM_TAXONID.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Aug 23, 2022
…ntation of tdwg/bdq#101 VALIDATION_POLYNOMIAL_CONSISTENT, also adding test cases from current validation data csv file that were failing, along with commented out case that may be in error in the validation data. Conforming implementation of tdwg/bdq#121 VALIDATION_TAXONID_COMPLETE to current specification for handling empty taxonID, also adding test cases from current validation data csv file that were failing.  Fixing methods that should be static but aren't.
@Tasilee
Copy link
Collaborator

Tasilee commented Nov 6, 2022

An informative comment from @timrobertson100 19th September 2022:

"When it comes to occurrence record processing, the GBIF occurrence systems currently pass this value on, only making use on the literal values (e.g. scientificName) so it’s not something we’d have a very strong an opinion on, in e.g. a spreadsheet. My gut feeling is a “scope:value” format (e.g. gbif:1234) is better than a URL, for the reason that URLs are generally less stable over time. As an example. “species” in that URL is already questionable and a future GBIF API would be better using e.g. “../taxon/..” and concept based identification of organisms".

@chicoreus
Copy link
Collaborator

How about, (taking in Tim And Markus' comments on scope:value): COMPLIANT if (1) taxonID is a validly formed LSID, or (2) taxonID is a validly formed URN with at least NID and NSS present, or (3) taxonID is in the form scope:value, or (4) taxonID is a validly formed URI with host and path where path consists of more than just "/"; otherwise NOT_COMPLIANT

@Tasilee Tasilee removed the NEEDS WORK label Nov 6, 2022
@ArthurChapman
Copy link
Collaborator Author

@chicoreus - see comment and question under #71

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 2, 2023

@ArthurChapman and I have re-read the Expected Response and we realize that we will need to better handle the terms LSID, URN, NID and NSS and possibly URI.

Do we expand it in the test?
Do we simply add a reference?
Do we add the terms to the Vocabulary?

@ArthurChapman
Copy link
Collaborator Author

I wouldn't add them to the Vocabulary as they are standard terms and this is the only test that uses them. I would add a reference(s) in the References and perhaps add a note if that can define them simply.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 26, 2023

OK, I've added a few Wikipedia references that seem easy to understand. In doing this, I note that we use a different format for References (dot points) compared with Information Elements and Examples (new table lines). I've used the latter here for illustration compared with, for example #102.

Does it matter? I think consistency is warranted.

@ArthurChapman
Copy link
Collaborator Author

Personally, I prefer the dot points

@ArthurChapman
Copy link
Collaborator Author

I have added to the Notes to be consistent with #71:

"When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:taxonID."

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2023

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted".

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
@Tasilee Tasilee added Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. and removed CORE TG2 CORE tests labels Dec 13, 2023
@Tasilee
Copy link
Collaborator

Tasilee commented Dec 13, 2023

We have made this test SUPPLEMENTARY and Closed the issue - which means that while we do not believe this test should be CORE, there may be circumstances where some communities may want it as CORE. This change derived from a discussion at TDWG 2023 on the use of dwc:taxonID and dwc:scientificNameID. It was concluded that a test VALIDATION_SCIENTIFICNAMEID was justified as CORE while this test should be SUPPLEMENTARY

@Tasilee Tasilee closed this as completed Dec 13, 2023
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 18, 2024
…date with current specification, the scope:value case is vauge, adding support for alphanumeric string scope:value pairs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance NAME Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation
Projects
None yet
Development

No branches or pull requests

5 participants