Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID #71

Open
iDigBioBot opened this issue Jan 5, 2018 · 56 comments
Open

TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID #71

iDigBioBot opened this issue Jan 5, 2018 · 56 comments
Labels
Amendment Completeness CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 VOCABULARY

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID f01fb3f9-2f7e-418b-9f51-adf50f202aea
Label AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID
Description Proposes an amendment to the value of dwc:scientificName using the dwc:scientificNameID value from the bdq:sourceAuthority.
TestType Amendment
Darwin Core Class dwc:Taxon
Information Elements ActedUpon dwc:scientificName
Information Elements Consulted dwc:scientificNameID
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is bdq:Empty, or dwc:scientificName is bdq:NotEmpty; FILLED_IN the value of dwc:scientificName if the value of dwc: scientificNameID could be unambiguously interpreted as a value in the bdq:sourceAuthority; otherwise NOT_AMENDED
Data Quality Dimension Completeness
Term-Actions SCIENTIFICNAME_FROM_SCIENTIFICNAMEID
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}
Specification Last Updated 2024-08-18
Examples [dwc:scientificNameID="gbif:8102122", dwc:scientificName="": Response.status=FILLED_IN, Response.result=dwc:scientificName="Harpullia pendula F.Muell.", Response.comment="dwc:scientificNameID contains an interpretable value"]
[dwc:scientificNameID="gbif:8a", dwc:scientificName="": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificNameID does not contain an interpretable value"]
Source iDigBio
References
Example Implementations (Mechanisms) Kurator/FilteredPush sci_name_qc Library
Link to Specification Source Code https://github.com/FilteredPush/sci_name_qc/blob/v1.1.2/src/main/java/org/filteredpush/qc/sciname/DwCSciNameDQ.java#L1156
Notes The value of dwc:scientificNameID is unambiguous if dwc:scientificNameID references a single taxon record in the bdq:sourceAuthority. When referencing a GBIF taxon by GBIF's identifier for that taxon, use the the pseudo-namespace "gbif:" and the form "gbif:{integer}" as the value for dwc:scientificNameID. Implementors can be aware of the current GBIF api endpoint that can replace the pseduo-namespace gbif: when looking up the dwc:scientificNameID (taxonID in the gbif document), e.g. s/gbif:/https:\/\/api.gbif.org\/v1\/species\// will transform the value taxonID=gbif:8102122 to the resolvable endpoint https://api.gbif.org/v1/species/8102122 The pseudo-namespace "gbif:" is recommended by GBIF to reference GBIF taxon records. Where resolvable persistent identifiers exist for dwc:scientificNameID values, they should be used in full, but implementors will need to support at least the "gbif:" pseudo-namespace.
@iDigBioBot
Copy link
Collaborator Author

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet:
It would seem that a scientificName consistency test is needed: scientificName is consistent with what's provided in genus, specificEpithet, etc. Added a test at the bottom. Also, I believe the converse tests should be included: genus, specificEpithet, infraspecificEp, sciNameAut completed from sciName. "GENUS_FROM_SCI_NAME" and the like

@iDigBioBot
Copy link
Collaborator Author

Comment by Paul Morris (@chicoreus) migrated from spreadsheet:
This can't be implemented until dwc:genericEpithet is approved. dwc:genus is NOT the atomic parse of genus from scientific name, it is genus into which the occurrence is classified, for types the two of these can differ.

@iDigBioBot
Copy link
Collaborator Author

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet:
I don't understand @pjm - a Genus CAN be parsed from a binomial by definition - at least in the Botanical Code. The Zoological Code doesn't inlcude the concept of a 'Specific Epithet' whereas the Botanical Code does (I am not up to date on Zoological Code but there was some discussion on adopting the concept from the Botanical Code) but as I understand both codes - "GENUS" can be standalone and does not need a separate GENUS Epithet concept.

@iDigBioBot
Copy link
Collaborator Author

Comment by Paul Morris (@chicoreus) migrated from spreadsheet:
but we can't implement this until dwc:genericEpithet is approved.

@godfoder godfoder changed the title TG2-AMENDMENT_SCIENTIFICNAME_FROM_COMPONENTS TG2-AMENDMENT_SCIENTIFICNAME_FROM_TAXONID Jan 18, 2018
@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Jan 18, 2018
@tucotuco tucotuco added the Parameterized Test requires a parameter label Nov 5, 2018
@chicoreus
Copy link
Collaborator

Phrasing of "scientificName was added", needs clearer specification, "added"
creates ambiguity about intention, unclear if implementors should only fill in empty scientificName, or if existing values should be changed. Specification needs to be clearer.

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 1, 2019

I've commented on the issues noted in @chicoreus email of September 1. Does that email raises a new (GitHub) issue as it would be good to document more consistently?

@ArthurChapman
Copy link
Collaborator

From @chicoreus : #71 ... AMENDED if dwc:scientificName was EMPTY and a value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED

@ArthurChapman
Copy link
Collaborator

Suggestion: We usually add the prerequesites in theINTERNAL_PREREQUISITES_NOT_MET rather than in the AMENDED part, so I suggest moving the dwc:scientificName was NOT_EMPTY Thus:

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or the dwc:scientificName was NOT_EMPTY; AMENDED if value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 8, 2020

Thanks @ArthurChapman - I agree that where possible, we include such tests in the INTERNALs. That reads well to me. Editing.

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 8, 2020

I have changed Expected response to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY; AMENDED dwc:scientificName from a successful lookup of dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED"

As noted elsewhere, we need to decide where "the value of dwc:...." as against "dwc:...".

the value of dwc:taxonID is ambiguous
vs
dwc:scientificName was not EMPTY

Also noted another reversion to NOT_EMPTY!

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Apr 8, 2020

Note @Tasilee that in the TG2 Vocabulary (#152) we have the term NOTEMPTY (A field that is present and has content.) Do we need to change the term in #152?

@chicoreus
Copy link
Collaborator

@Tasilee "the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY;" probably is a good example, value of dwc:x is ambiguous, talking explicitly about the value, and dwc:x is empty indicating that the term is empty, one option within that scope being that the value is an empty string.

@ArthurChapman , If we need both EMPTY and NOT_EMPTY, then we should probably define NOT_EMPTY as simply the logical inverse of EMPTY, if we don't need it, then we could reference "not EMPTY" in the specifications.

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 8, 2020

BTW, we have three tests with labels

TG2-NOTIFICATION_ANNOTATION_NOTEMPTY
TG2-NOTIFICATION_DATAGENERALIZATIONS_NOTEMPTY
TG2-NOTIFICATION_ESTABLISHMENTMEANS_NOTEMPTY

Currently all references in Expected responses are now "not EMPTY" so I would concur with @chicoreus

@ArthurChapman
Copy link
Collaborator

I think it is in #152 because of the three test names. We can leave it there as that definition applies to those three. But in the tests use not EMPTY.

@chicoreus
Copy link
Collaborator

@tucotuco draws a paralell with the pseudo-namespace epsg used with geodetic datum.

@chicoreus
Copy link
Collaborator

Updated examples and notes to reflect recommendation for the use of the pseudo-namespace gbif: for taxonID, no change needed to the specification.

@Tasilee
Copy link
Collaborator

Tasilee commented Jun 12, 2023

Thanks @chicoreus. All: Please advise me of any implied changes to the test data on related issues.

@Tasilee
Copy link
Collaborator

Tasilee commented Jun 13, 2023

Restructured Parameter(s) and Source authority

@Tasilee
Copy link
Collaborator

Tasilee commented Jun 14, 2023

Changed test data positive and negative examples. BTW, all the test examples are generated from the test data.

@ArthurChapman
Copy link
Collaborator

Updated Notes - changed "VALIDATION_TAXONID_AMBIGUOUS" to "VALIDATION_TAXONID_UNAMBIGUOUS (4c09f127-737b-4686-82a0-7c8e30841590)"

@chicoreus
Copy link
Collaborator

Notes reference VALIDATION_TAXONID_UNAMBIGUOUS (4c09f127-737b-4686-82a0-7c8e30841590) where 4c09f127-737b-4686-82a0-7c8e30841590 #70 is VALIDATION_TAXON_UNAMBIGUOUS, while there isn't a VALIDATION_TAXONID_UNAMBIGUOUS, the closest is #121 VALIDATION_TAXONID_COMPLETE a82c7e3a-3a50-4438-906c-6d0fefa9e984, which has notes indicating we considered its predecessor VALIDATION_TAXONID_AMBIGUOUS too complex to implement.

#121 isn't relevant to ambiguity.

#70 only has one relevant clause: " dwc:taxonId references a single taxon record in the bdq:sourceAuthority," I suggest we remove the contradictory cross reference from the notes, and replace it with: "The value of dwc:taxonID is unambiguous if dwc:taxonId references a single taxon record in the bdq:sourceAuthority. "

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 3, 2023
@ymgan
Copy link
Collaborator

ymgan commented Jul 3, 2023

Looking at the example

[dwc:taxonID="gbif:8102122", dwc:scientificName="": Response.status=FILLED_IN, Response.result=dwc:scientificName="Harpullia pendula F.Muell.", Response.comment="dwc:taxonID contains an interpretable value"]

Does that mean a term change proposal should to be submitted for taxonID? One of the examples is

https://www.gbif.org/species/212

@chicoreus
Copy link
Collaborator

@ymgan likely, yes. We've got in the notes for this test: "The pseudo-namespace gbif: is recommeded by GBIF for use in taxonID to reference GBIF taxon records. " See also the coment #71 (comment)

@ArthurChapman
Copy link
Collaborator

I have updated the notes in line with comment by @chicoreus, above.

@Tasilee
Copy link
Collaborator

Tasilee commented Jul 4, 2023

Amended Source Authority values to align with @chicoreus syntax

From

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" [https://doi.org/10.15468/39omei] |
| | API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]

to

bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]}

@Tasilee Tasilee removed the NEEDS WORK label Jul 4, 2023
@ymgan
Copy link
Collaborator

ymgan commented Jul 17, 2023

Hello, I am having difficulty trying to look at how OBIS can align its data quality check for taxon specifically with these 2 tests below:

OBIS uses scientificNameID and WoRMS for the scientificName as well as taxon classification instead of taxonID.

Checks Fields
Taxon should unambiguously match with WoRMS. scientificName, scientificNameID

From what I understood, the reason that scientificNameID is used instead of taxonID is because WoRMS lacks stable identifiers for taxon concepts.

scientificNameID is a mandatory field for OBIS. Bob also made a comment here about the lack of usage of taxonID in datasets that he worked with.

I guess my question is, would adding a test for scientificNameID makes sense? or does it make sense to have taxonID/scientificNameID as a parameter of these tests?

Any guidance would be very much appreciated, thanks a lot!!

@chicoreus
Copy link
Collaborator

@ymgan at the MCZ, and I think within the TG2 working group (coming out of a long history of discussions at NOMINA meetings) we've come to the opposite conclusion. I expect @tucotuco and others will want to comment as well, perhaps to correct my understanding. dwc:taxonID has the definition: "An identifier for the set of dwc:Taxon information. May be a global unique identifier or an identifier specific to the data set." In the layers of name strings, name string bins, nomenclatural acts, taxon concepts, and classifications worked out in various NOMINA meetings (heavily influenced by the thinking of the late Dave Remsen) dwc:taxonID is (I believe by design) vauge. It is an identifier for the package of information associated with a Taxon class, without linking a particular meaning (name string, nomenclatural act, taxon concept, taxon concept including classification) to the instance of the Taxon class. The dwc:taxonID serves as the identifier for the set of information in the terms in a dwc:Taxon instance, without applying additional semantics to the dwc:Taxon instance.

On the other hand the definition for dwc:scientificNameID is "An identifier for the nomenclatural (not taxonomic) details of a scientific name." That is explicitly pointing at an authoritaitve source of information on nomenclatural acts, nomenclators. There are very few of these. IPNI is one, IndexFungorum another, ZooBank another. They explicitly assign identifiers to nomenclatural acts. WoRMS is not one of these. It would not be appropriate to report an LSID from WoRMS as a scientificNameID. The LSID from WoRMS would appropriately go in the taxonID, See the examples given in the various Taxon ...ID terms in Darwin Core. scientificNameID lists only an ipni LSID, others list multiple possibilities, including references to GBIF's backbone taxonomy.

More broadly, in designing tests around CORE uses, we considered one term (or a package of terms) within each of the TIME/SPACE/NAME concept areas to have primacy, for names, dwc:taxonID, for time, dwc:eventDate, for space dwc:decimalLatitude + dwc:decimalLongitude + dwc:geodeticDatum + dwc:coordinateUncertaintyInMeters + dwc:coordinatePrecision, with other terms in each area providing alternative representations (often with the ability to represent only less complete information as in dwc:year, dwc:month, dwc:day) or providing supplemental metadata (as in dwc:georeferenceProtocol). For the Taxon class terms, we deliberately chose dwc:taxonID as the term with primacy, and these two tests reflect this, being a test that can fill in an empty taxonID from other taxon terms (#57) or this one, use the taxonID to fill in other terms.

That having been said, and having just integrated the sci_name_qc implementation of the NAME tests into MCZbase, I suspect we've got some more work to do around "OBIS uses ... WoRMS for the ... taxon classification". In MCZbase, we are currently only using WoRMS or IRMNG LSIDs for taxonID, and only ZooBank identifiers for scientificNameID (though we may start using GBIF gbif:{integer} identifiers to link to GBIF backbone taxonomy records. The assumptions around the NAME tests are that a data source will use a single authority. Within MCZbase data, or when WoRMS data are aggregated with other data, subsets of the data will rely on different authorities for slices of the data, and the test specifications assume that to have quality, data must be conformed to a single authority. For use within OBIS, this test #71 is straightforward, you simply specify WoRMS as the bdq:sourceAuthority instead of GBIF Backbone Taxonomy. Similarly for #81, #22, the VALIDATION_{higherrank}_FOUND tests, just specify WoRMS as the authority to check against. But #70 VALIDATION_TAXON_UNAMBIGUOUS poses more of a challenge. You may be aggregating data that uses WoRMS identifiers in some cases, IPNI identifiers in others, IRMNG identifiers in others, GBIF identifiers in others, similarly for other aggregators. #70 may not adequately address multiple reliable sources of authority within a data set.

@tucotuco
Copy link
Member

The interpretation @chicoreus shares above aligns perfectly with my understanding.

@ymgan
Copy link
Collaborator

ymgan commented Jul 20, 2023

Thank you so much @chicoreus and @tucotuco !! I really appreciate the detail explanation and thank you for going a step further looking at other tests as well!! I will see what I can do from this side, thank you!!

@chicoreus chicoreus added the CORE TG2 CORE tests label Sep 18, 2023
@Tasilee Tasilee changed the title TG2-AMENDMENT_SCIENTIFICNAME_FROM_TAXONID TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENIFICNAMEID Dec 13, 2023
@Tasilee Tasilee changed the title TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENIFICNAMEID TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID Dec 13, 2023
@chicoreus
Copy link
Collaborator

Updated term-actions and date last updated.

chicoreus added a commit to FilteredPush/sci_name_qc that referenced this issue Jul 19, 2024
…rom using taxonID to scientificNameID, incrementing version number to 1.1.0-SNAPSHOT to reflect API change in method names moving from taxonID to scientificNameId in this and other tests.
@Tasilee
Copy link
Collaborator

Tasilee commented Jul 20, 2024

Term-Actions changed but not "Specification last updated". I've changed it to 2024-07-19. Was this the intent?

@chicoreus
Copy link
Collaborator

Edit to term-actions was just catching the term-actions up with the label, not affecting the specification or implementations.

Specification last updated was also updated to: 2023-12-13, this is the date of the last substantive change to the specification, I've corrected it back to this.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 18, 2024

Changed Expected Response from

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is EMPTY, the value of dwc:scientificNameID is ambiguous, or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:scientificNameID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

to

EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is EMPTY, or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc: scientificNameID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Amendment Completeness CORE TG2 CORE tests NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 VOCABULARY
Projects
None yet
Development

No branches or pull requests

6 participants