-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID #71
Comments
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: |
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: |
Phrasing of "scientificName was added", needs clearer specification, "added" |
I've commented on the issues noted in @chicoreus email of September 1. Does that email raises a new (GitHub) issue as it would be good to document more consistently? |
From @chicoreus : #71 ... AMENDED if dwc:scientificName was EMPTY and a value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED |
Suggestion: We usually add the prerequesites in theINTERNAL_PREREQUISITES_NOT_MET rather than in the AMENDED part, so I suggest moving the dwc:scientificName was NOT_EMPTY Thus: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if the field dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or the dwc:scientificName was NOT_EMPTY; AMENDED if value was added from a lookup of the dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED |
Thanks @ArthurChapman - I agree that where possible, we include such tests in the INTERNALs. That reads well to me. Editing. |
I have changed Expected response to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:taxonID is EMPTY, the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY; AMENDED dwc:scientificName from a successful lookup of dwc:taxonID in the bdq:sourceAuthority; otherwise NOT_CHANGED" As noted elsewhere, we need to decide where "the value of dwc:...." as against "dwc:...". the value of dwc:taxonID is ambiguous Also noted another reversion to NOT_EMPTY! |
@Tasilee "the value of dwc:taxonID is ambiguous or dwc:scientificName was not EMPTY;" probably is a good example, value of dwc:x is ambiguous, talking explicitly about the value, and dwc:x is empty indicating that the term is empty, one option within that scope being that the value is an empty string. @ArthurChapman , If we need both EMPTY and NOT_EMPTY, then we should probably define NOT_EMPTY as simply the logical inverse of EMPTY, if we don't need it, then we could reference "not EMPTY" in the specifications. |
BTW, we have three tests with labels TG2-NOTIFICATION_ANNOTATION_NOTEMPTY Currently all references in Expected responses are now "not EMPTY" so I would concur with @chicoreus |
I think it is in #152 because of the three test names. We can leave it there as that definition applies to those three. But in the tests use not EMPTY. |
@tucotuco draws a paralell with the pseudo-namespace epsg used with geodetic datum. |
Updated examples and notes to reflect recommendation for the use of the pseudo-namespace gbif: for taxonID, no change needed to the specification. |
Thanks @chicoreus. All: Please advise me of any implied changes to the test data on related issues. |
Restructured Parameter(s) and Source authority |
Changed test data positive and negative examples. BTW, all the test examples are generated from the test data. |
Updated Notes - changed "VALIDATION_TAXONID_AMBIGUOUS" to "VALIDATION_TAXONID_UNAMBIGUOUS (4c09f127-737b-4686-82a0-7c8e30841590)" |
Notes reference VALIDATION_TAXONID_UNAMBIGUOUS (4c09f127-737b-4686-82a0-7c8e30841590) where 4c09f127-737b-4686-82a0-7c8e30841590 #70 is VALIDATION_TAXON_UNAMBIGUOUS, while there isn't a VALIDATION_TAXONID_UNAMBIGUOUS, the closest is #121 VALIDATION_TAXONID_COMPLETE a82c7e3a-3a50-4438-906c-6d0fefa9e984, which has notes indicating we considered its predecessor VALIDATION_TAXONID_AMBIGUOUS too complex to implement. #121 isn't relevant to ambiguity. #70 only has one relevant clause: " dwc:taxonId references a single taxon record in the bdq:sourceAuthority," I suggest we remove the contradictory cross reference from the notes, and replace it with: "The value of dwc:taxonID is unambiguous if dwc:taxonId references a single taxon record in the bdq:sourceAuthority. " |
…dding test case.
Looking at the example
Does that mean a term change proposal should to be submitted for taxonID? One of the examples is |
@ymgan likely, yes. We've got in the notes for this test: "The pseudo-namespace gbif: is recommeded by GBIF for use in taxonID to reference GBIF taxon records. " See also the coment #71 (comment) |
I have updated the notes in line with comment by @chicoreus, above. |
Amended Source Authority values to align with @chicoreus syntax From bdq:sourceAuthority default = "GBIF Backbone Taxonomy" [https://doi.org/10.15468/39omei] | to bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]} |
Hello, I am having difficulty trying to look at how OBIS can align its data quality check for taxon specifically with these 2 tests below:
From what I understood, the reason that scientificNameID is used instead of taxonID is because WoRMS lacks stable identifiers for taxon concepts. scientificNameID is a mandatory field for OBIS. Bob also made a comment here about the lack of usage of taxonID in datasets that he worked with. I guess my question is, would adding a test for scientificNameID makes sense? or does it make sense to have taxonID/scientificNameID as a parameter of these tests? Any guidance would be very much appreciated, thanks a lot!! |
@ymgan at the MCZ, and I think within the TG2 working group (coming out of a long history of discussions at NOMINA meetings) we've come to the opposite conclusion. I expect @tucotuco and others will want to comment as well, perhaps to correct my understanding. dwc:taxonID has the definition: "An identifier for the set of dwc:Taxon information. May be a global unique identifier or an identifier specific to the data set." In the layers of name strings, name string bins, nomenclatural acts, taxon concepts, and classifications worked out in various NOMINA meetings (heavily influenced by the thinking of the late Dave Remsen) dwc:taxonID is (I believe by design) vauge. It is an identifier for the package of information associated with a Taxon class, without linking a particular meaning (name string, nomenclatural act, taxon concept, taxon concept including classification) to the instance of the Taxon class. The dwc:taxonID serves as the identifier for the set of information in the terms in a dwc:Taxon instance, without applying additional semantics to the dwc:Taxon instance. On the other hand the definition for dwc:scientificNameID is "An identifier for the nomenclatural (not taxonomic) details of a scientific name." That is explicitly pointing at an authoritaitve source of information on nomenclatural acts, nomenclators. There are very few of these. IPNI is one, IndexFungorum another, ZooBank another. They explicitly assign identifiers to nomenclatural acts. WoRMS is not one of these. It would not be appropriate to report an LSID from WoRMS as a scientificNameID. The LSID from WoRMS would appropriately go in the taxonID, See the examples given in the various Taxon ...ID terms in Darwin Core. scientificNameID lists only an ipni LSID, others list multiple possibilities, including references to GBIF's backbone taxonomy. More broadly, in designing tests around CORE uses, we considered one term (or a package of terms) within each of the TIME/SPACE/NAME concept areas to have primacy, for names, dwc:taxonID, for time, dwc:eventDate, for space dwc:decimalLatitude + dwc:decimalLongitude + dwc:geodeticDatum + dwc:coordinateUncertaintyInMeters + dwc:coordinatePrecision, with other terms in each area providing alternative representations (often with the ability to represent only less complete information as in dwc:year, dwc:month, dwc:day) or providing supplemental metadata (as in dwc:georeferenceProtocol). For the Taxon class terms, we deliberately chose dwc:taxonID as the term with primacy, and these two tests reflect this, being a test that can fill in an empty taxonID from other taxon terms (#57) or this one, use the taxonID to fill in other terms. That having been said, and having just integrated the sci_name_qc implementation of the NAME tests into MCZbase, I suspect we've got some more work to do around "OBIS uses ... WoRMS for the ... taxon classification". In MCZbase, we are currently only using WoRMS or IRMNG LSIDs for taxonID, and only ZooBank identifiers for scientificNameID (though we may start using GBIF gbif:{integer} identifiers to link to GBIF backbone taxonomy records. The assumptions around the NAME tests are that a data source will use a single authority. Within MCZbase data, or when WoRMS data are aggregated with other data, subsets of the data will rely on different authorities for slices of the data, and the test specifications assume that to have quality, data must be conformed to a single authority. For use within OBIS, this test #71 is straightforward, you simply specify WoRMS as the bdq:sourceAuthority instead of GBIF Backbone Taxonomy. Similarly for #81, #22, the VALIDATION_{higherrank}_FOUND tests, just specify WoRMS as the authority to check against. But #70 VALIDATION_TAXON_UNAMBIGUOUS poses more of a challenge. You may be aggregating data that uses WoRMS identifiers in some cases, IPNI identifiers in others, IRMNG identifiers in others, GBIF identifiers in others, similarly for other aggregators. #70 may not adequately address multiple reliable sources of authority within a data set. |
The interpretation @chicoreus shares above aligns perfectly with my understanding. |
Thank you so much @chicoreus and @tucotuco !! I really appreciate the detail explanation and thank you for going a step further looking at other tests as well!! I will see what I can do from this side, thank you!! |
Updated term-actions and date last updated. |
…rom using taxonID to scientificNameID, incrementing version number to 1.1.0-SNAPSHOT to reflect API change in method names moving from taxonID to scientificNameId in this and other tests.
Term-Actions changed but not "Specification last updated". I've changed it to 2024-07-19. Was this the intent? |
Edit to term-actions was just catching the term-actions up with the label, not affecting the specification or implementations. Specification last updated was also updated to: 2023-12-13, this is the date of the last substantive change to the specification, I've corrected it back to this. |
Changed Expected Response from EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is EMPTY, the value of dwc:scientificNameID is ambiguous, or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc:scientificNameID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED to EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificNameID is EMPTY, or dwc:scientificName was not EMPTY; FILLED_IN the value of dwc:scientificName if the value of dwc: scientificNameID could be unambiguously interpreted as a value in bdq:sourceAuthority; otherwise NOT_AMENDED |
s/gbif:/https:\/\/api.gbif.org\/v1\/species\//
will transform the value taxonID=gbif:8102122 to the resolvable endpoint https://api.gbif.org/v1/species/8102122 The pseudo-namespace "gbif:" is recommended by GBIF to reference GBIF taxon records. Where resolvable persistent identifiers exist for dwc:scientificNameID values, they should be used in full, but implementors will need to support at least the "gbif:" pseudo-namespace.The text was updated successfully, but these errors were encountered: