Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_TAXON_UNAMBIGUOUS #70

Open
iDigBioBot opened this issue Jan 5, 2018 · 56 comments
Open

TG2-VALIDATION_TAXON_UNAMBIGUOUS #70

iDigBioBot opened this issue Jan 5, 2018 · 56 comments
Labels
Conformance CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 4c09f127-737b-4686-82a0-7c8e30841590
Label VALIDATION_TAXON_UNAMBIGUOUS
Description Can the taxon be unambiguously resolved from bdq:sourceAuthority using the available taxon terms?
TestType Validation
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:taxonID
dwc:scientificName
dwc:scientificNameID
dwc:acceptedNameUsageID
dwc:originalNameUsageID
dwc:taxonConceptID
dwc:higherClassification
dwc:kingdom
dwc:phylum
dwc:class
dwc:order
dwc:superfamily
dwc:family
dwc:subfamily
dwc:tribe
dwc:subtribe
dwc:genus
dwc:genericName
dwc:subgenus
dwc:infragenericEpithet
dwc:specificEpithet
dwc:infraspecificEpithet
dwc:cultivarEpithet
dwc:vernacularName
dwc:scientificNameAuthorship
dwc:taxonRank
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if all of dwc:scientificNameID, dwc:scientificName, dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:scientificNameAuthorship, dwc:cultivarEpithet are bdq:Empty; COMPLIANT if (1) dwc:scientificNameID references a single taxon record in the bdq:sourceAuthority, or (2) dwc:scientificNameID is bdq:Empty and dwc:scientificName references a single taxon record in the bdq:sourceAuthority, or (3) if dwc:scientificName and dwc:scientificNameID are bdq:Empty and if a combination of the values of the terms dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:cultivarEpithet, dwc:taxonRank, and dwc:scientificNameAuthorship can be unambiguously resolved to a unique taxon in the bdq:sourceAuthority, or (4) if ambiguity produced by multiple matches in (2) or (3) can be disambiguated to a unique Taxon using the values of dwc:tribe, dwc:subtribe, dwc:subgenus, dwc:genus, dwc:subfamily, dwc:family, dwc:superfamily, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:taxonID, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID and dwc:vernacularName; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions TAXON_UNAMBIGUOUS
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}
Specification Last Updated 2023-09-18
Examples [dwc:taxonID="", dwc:scientificNameID="", dwc:acceptedNameUsageID="", dwc:originalNameUsageID="", dwc:taxonConceptID="", dwc:scientificName="Triplex rosaria Perry, 1811", dwc:higherClassification="", dwc:kingdom="Animalia", dwc:phylum="mollusca", dwc:class="Gastropoda", dwc:order="", dwc:family="Muricidae", dwc:subfamily="", dwc:genus="Chicoreus", dwc:genericName="Triplex", dwc:subgenus="", dwc:infragenericEpithet="", dwc:specificEpithet="rosarium", dwc:infraspecificEpithet="", dwc:cultivarEpithet="", dwc:vernacularName="", dwc:scientificNameAuthorship="Perry, 1811", dwc:taxonRank="",bdq:sourceAuthority=”marinespecies.org”: Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:scientificName matched to unique taxon record in WoRMS, unique fuzzy match on name and exact match on authorship. "]
[dwc:taxonID="", dwc:scientificNameID="", dwc:acceptedNameUsageID="", dwc:originalNameUsageID="", dwc:taxonConceptID="", dwc:scientificName="Graphis", dwc:higherClassification="", dwc:kingdom="", dwc:phylum="", dwc:class="", dwc:order="", dwc:family="", dwc:subfamily="", dwc:genus="", dwc:genericName="", dwc:subgenus="", dwc:infragenericEpithet="", dwc:specificEpithet="", dwc:infraspecificEpithet="", dwc:cultivarEpithet="", dwc:vernacularName="", dwc:scientificNameAuthorship="", dwc:taxonRank="": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:scientificName="Graphis" is ambiguous as could be either a lichen or a gastropod."]
Source ALA, GBIF, CRIA
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L796 https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L843
Notes There are any number of potential controlled vocabularies that might be used for this test, including local vocabularies and taxon specific vocabularies. If dwc:scientificNameID is empty, use dwc:scientificName and dwc:CultivarEpithet to search for a unique taxon. If dwc:scientificName is bdq:Empty, check with the terms that form atomic parts of it (dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship), and if more than one match is found, use the remaining terms to try to disambiguate to a single Taxon record. The terms dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:higherClassification, dwc:scientificNameID,, dwc:acceptedNameUsageID, dwc:originalNameUsageID, dwc:taxonConceptID should not be used to make a match if dwc:scientificNameID and dwc:scientificName or dwc:genericName, dwc:specificEpithet, dwc:infraspecificEpithet, dwc:taxonRank, dwc:scientificNameAuthorship are bdq:Empty. Note that test VALIDATION_SCIENTIFICNAME_FOUND (4c09f127-737b-4686-82a0-7c8e30841590) is a more specific test for a subset of Information Elements from this test.
@iDigBioBot
Copy link
Collaborator Author

Comment by Paul Morris (@chicoreus) migrated from spreadsheet:
There are homonyms created by the same author in the same year in the same higher taxon, indeed in a few cases in the same work, and in at least one case on the same page in the same work. Given Flat Darwin Core, there are cases where Darwin Core data could be filled out sufficently to disambiguate the homonyms, and other cases where Darwin Core terms for the text of a scientific name are unable to provide sufficient information for disambiguation. Including the taxon ids and referencing an id in a nomenclator or in a taxonomic authority can resolve these. The key to this test is whether or not the taxon terms can be uniquely related to the id of a nomenclatural act, however, outside vascular plants and fungi, we are lacking in nomenclators. Perhaps the test is: a pair: MEASURE_TAXONID_COMPLETENESS and an amendment LOOKUP_TAXONID, which could return a result state of ambiguous for homonyms.

@godfoder godfoder changed the title TG2-VALIDATION_SCIENTIFICNAME_AMBIGUOUS TG2-VALIDATION_TAXON_AMBIGUOUS Jan 17, 2018
@godfoder
Copy link
Contributor

Related to #57

@ianengelbrecht
Copy link
Collaborator

Regarding "INTERNAL_PREREQUISITES_NOT_MET if all of the fields dwc:scientificName, dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom are either not present or are EMPTY", would it make sense to say that only the ranks up to dwc:taxonRank are required? Also, would dwc:specificEpithet and dwc:infraspecificEpithet also be required where dwc:taxonRank is species or below? Also, can this be restricted to the primary taxon ranks, kingdom, phylum, class, order, family, genus, and have dwc:subgenus removed as an internal prerequisite. My apologies if these have already been discussed and resolved in the group.

@ArthurChapman
Copy link
Collaborator

Thanks @ianengelbrecht. What we are saying if none of those are present, or all that are present are empty, then the test can't be run. If some are present (even dwc:subGenus), then the test can be run (it may fail and be Non-Compliant if it can't be resolved). It not saying that all have to be present, but saying that the INTERNAL_PREREQUISITES can't be satisfied if there are no relevant fields or all are empty (all ... are not present). Perhaps we could rewrite to make it clearer (i.e. none of these fields are present) but we have tried to be consistent in the way we have worded all these. I've got this correct haven't I @chicoreus

@ianengelbrecht
Copy link
Collaborator

Thanks @ArthurChapman, how bout "INTERNAL_PREREQUISITES_NOT_MET if none of the fields dwc:scientificName, dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom are present, or all of those present are EMPTY;"

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 14, 2019

Thanks @ianengelbrecht and @ArthurChapman. Unless I am mistaken (always possible), #105 covers that scenario.

@tucotuco
Copy link
Member

@Tasilee Yes, issue #105 (TG2_TAXON_VALIDATION_EMPTY ) is a test to cover that situation, but this test (TG2_VALIDATION_TAXON_AMBIGUOUS) still needs a response for cases where the response from TG2_TAXON_VALIDATION_EMPTY is NON_COMPLIANT. How about the following wording?

INTERNAL_PREREQUISITES_NOT_MET if a) none of the fields dwc:scientificName, dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom are present or b) all of those same fields that are present are EMPTY.

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 26, 2019

Thanks @tucotuco - I can live with that :). @ArthurChapman?

@ArthurChapman
Copy link
Collaborator

I think that is OK

@Tasilee
Copy link
Collaborator

Tasilee commented Mar 27, 2019

Thanks @ArthurChapman. Done

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Sep 12, 2022
…t continuing to run test if source authority is unknown.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Sep 14, 2022
…addressing a record with just a taxonID and no other terms with new SciNameUtils method validateTaxonID. Adding missing setValue(amendment) statements when proposing amendments.
@Tasilee
Copy link
Collaborator

Tasilee commented Jun 13, 2023

Restructured Parameter(s) and Source authority

@ArthurChapman
Copy link
Collaborator

Replaces "#46" in notes with "VALIDATION_SCIENTIFICNAME_FOUND (4c09f127-737b-4686-82a0-7c8e30841590)"

@Tasilee
Copy link
Collaborator

Tasilee commented Jun 14, 2023

One would hope that (all) the github issue references would be translated for the csv

@ArthurChapman
Copy link
Collaborator

It won't automatically translate and we need to add the GUID

@Tasilee
Copy link
Collaborator

Tasilee commented Jun 14, 2023

A program like my Python code that I use for test spec dumps could easily do the translation (add Label and GUID) - and that creates a csv file.

@ArthurChapman
Copy link
Collaborator

Well - wherever it occurs in the GitHub table - it will need to be translated. I have been working through them whenever I find them.

@ArthurChapman
Copy link
Collaborator

Updated notes - changed "dwc:TaxonID" to "dwc:taxonID"

@chicoreus
Copy link
Collaborator

Will need to include the new terms dwc:superfamily, dwc:tribe, dwc:subtribe tdwg/dwc#65 tdwg/dwc#45 tdwg/dwc#46

@Tasilee
Copy link
Collaborator

Tasilee commented Jul 4, 2023

Added the terms dwc:superfamily, dwc:tribe, dwc:subtribe to the Information elements and Expected response, and updated Specification Last Updated.

On this one, please check my Expected response.

@Tasilee Tasilee removed the NEEDS WORK label Jul 4, 2023
@Tasilee
Copy link
Collaborator

Tasilee commented Jul 4, 2023

Amended Source Authority values to align with @chicoreus syntax

From

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" [https://doi.org/10.15468/39omei] |
| | API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]

to

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 7, 2023
…pecifications. Addressed tdwg/bdq#70 VALIDATION_TAXON_UNAMBIGUOUS Updated metadata, ProvidesVersion and Specification annotations.   Added support for gbif:{integer} pseudo-namespace.  Improved support for specification, still needs work noted with TODO comments.   Removed reviewed stub method.   Added test cases.  Added comparator to Taxon class to compare non-empty higher ranks between a Taxon and a NameMatch.  Handling parsing of json with class or with clazz in GBIF api response, silent change from clazz?
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 13, 2023
…pecifications. Addressed tdwg/bdq#70 VALIDATION_TAXON_UNAMBIGUOUS improved support for plausible matching on scientific name authorship for variations in which terms contain the authorship.  Clarified intent of ScientificNameComparator.compare(), adding to documentation and renaming as compareWithoutAuthor().  Added test cases to cover the multiple paths in the specification and more cases of ambiregnal homonyms.  Unit tests and integration testas all passing.
@Tasilee
Copy link
Collaborator

Tasilee commented Jul 14, 2023

Changed positive example (and test data that it was derived from) from

dwc:scientificName="Triplex rosarium Perry, 1811

to

dwc:scientificName="Triplex rosaria Perry, 1811

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 15, 2023
…aining tests tdwg/bdq#70 VALIDATION_TAXON_UNAMBIGUOUS and tdwg/bdq#123 VALIDATION_CLASSIFICATION_CONSISTENT.  Metadata, including source authority values, updated.  Some cleanup of other comments, and consistency of comments in defaults class.
@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2023

Splitting bdqffdq:Information Elements into "Information Elements ActedUpon" and "Information Elements Consulted": This one needs checking. My logic was from the Expected Response.

Also changed "Field" to "TestField", "Output Type" to "TestType" and updated "Specification Last Updated"

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 18, 2023

Changed all Information Elements to "ActedUpon" as per Paul's Java Code

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 19, 2024
…through change to scientificNameID, at point where unit tests are passing, though unit tests may not all be up to date with specification. Added block of code to handle COMPLIANT case 1, adding comments on checked portions of specification.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 19, 2024
… up code to reflect primacy of presence of scientificNameID and use of other terms to disambguate instead of checking consistency, improving comments, adding some test cases, still needs work on clause (4) disambiguation.
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 24, 2024
chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 25, 2024
…aID 523 lookup including cultivar name from APNI for tdwg/bdq#70.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Conformance CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Projects
None yet
Development

No branches or pull requests

7 participants