-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-AMENDMENT_POLYNOMIAL_STANDARDIZED #45
Comments
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: |
I have changed the wording of the Notes FROM: This test is not intended to make alterations of a taxonomic nature. The intent of this test is not to fix errors or inconsistencies in the format of the dwc:scientificNameAuthorship. For the purpose of this amendment, if the genus in the dwc:genus field does not match the genus of the polynomial, the genus of the polynomial takes precedence for standardization. TO: The purpose of this Amendment is to correct errors in spelling and typography only. It is not intended to make changes of a taxonomic nature or to deal with errors or inconsistencies in the format of the Authorship. For the purpose of this amendment, if the genus in the dwc:genus field does not match the genus of the polynomial, the genus of the polynomial takes precedence for standardization. |
@ArthurChapman improvement in expressing an intent, though a problematic one. Also, "Polynomial" is still problematic. there is no dwc:polynomial,. dwc:scientificName can contain either a uninomial or a polynomial, depending on the rank of the identification. A polynomial (with danger, as darwin core defines genus as the current classification of the scientific name, not the generic part of the dwc:scientificName) can be built from dwc;genus plus dwc;specificEpithent plus dwc:infraspecificEpithet if dwc:specificEpithet is populated, but the specification is mute about what is meant by polynomial in the notes, and the specification does not appear to include a need for terms other than dwc:scientificName, with according to the notes, some unspecified magic removing the authorship from consideration in that value.... The specification is currently mute on authorship, so an implementor's presumption would be that what is to be compared is the entire value found in the dwc:scientificName as compared with the best match in the specified source authority. If there is a desire to not include authorship, then there must be an unambigous specification as to how this is to be done (either with a (defined) parser, or removing the value found in dwc:scientificNameAuthorship from the end of the value found in dwc:scientificName, or by using a defined beginning of string only matching method on the source authority side). As currently phrased, the notes still represent magical thinking about the ability to detect which part of dwc:scientificName is the authorship and which parts are not for the wide range of names of all ranks, hybrids, and complex authorship strings under each of the codes, including the presence of initial capital letters in specific, subspecific, and infraspecific epithets in historical names, authorship strings embedded within name strings for hybrids and trinomials and quadranomials, and all sorts of interesting common cases. |
After a fun discussion with @ArthurChapman, I think this boils down to how I responded to @chicoreus via email: POLYNOMIAL entails parsing on our end, but we assume parsing within the bdq:sourceAuthority as in the case of #57, don't we? My feeling is we remove #46 and #45 because @chicoreus informs us it is complex? My point is we throw whatever is in dwc:scientificName at bdq:sourceAuthority with #57. |
The original idea for Tests #45 and #46 was to fix minor spelling errors in the names (i.e. smithi versus smithii, litoralis versus littoralis etc.). This is something that CRIA does very well with its tests. There were other tests that involve the Taxon, TaxonID, and Scientific Name (+others). If we included Authorship and rank (var., ssp.) in these tests, then we are basically making these tests a duplication of other tests we already have (i.e. those dealing with combinations of TAXONID, TAXON and SCIENTIFICNAME). Given that, and the difficulty that @chicoreus mentions with parsing out the polynomial components from dwc:scientificName, etc., I see little value in continuing with these two tests (#45 and #46). I thus suggest that we simplify the process and change these two tests to SUPPLEMENTARY. |
An alternative to moving this test to supplementary would be to specify an explicit means of handling the authorship in this test, for example: change name from amendment polynomial standardized to amendment namestring standardized. information elements: dwc:scientificName, dwc:scientificNameAuthorship specification: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if either dwc:scientificName or dwc:scientificNameAuthorship is EMPTY; AMENDED if the text string represented in dwc:scientificName with the text string present in dwc:scientificNameAuthorship removed from the end of is not a match for a scientific name string in the bdq:sourceAuthority and it can be unambiguously corrected to the name string of a known scientific name string consisting of the same number of words (here we could specify a maximum string distance for transformation) according to the bdq:sourceAuthority; otherwise NOT_CHANGED A similar test with consideration of authorship could be included as supplemental. In the notes, note that #70 identifies whether the specified source authority has an unambiguuous single record for the taxon, including the higher classification and authorship string, that #101 identifies inconsistencies between the scientific name and the atomic fields, and that #57 is the key amendment to propose a taxon id given the textual terms, including authorship., |
That might work @chicoreus - it still has the problem of rank (ie. straight trinomial, trinomial with var., ssp., subsp., forma, f., etc.) |
I tend to agree with @ArthurChapman. Once we open the Pandora's Box of parsing dwc:scientificName, don't we need specific rules based upon a vocabulary that can assure us of a high probability of success? Flagging a potential issue as in the VALIDATION #46 is an equal challenge, but a safer test than this AMENDMENT. We also have the following tests that seem to me to have similar problems (as noted by @chicoreus): #101: "COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet;..." #46: "COMPLIANT if there are no nomenclatural errors (e.g. typographical errors and misspellings) of a polynomial, as represented in dwc:scientificName according to the bdq:sourceAuthority service; ..." #70: " COMPLIANT if the combination of values of dwc:Taxon terms (dwc:scientificName, dwc:scientificNameAuthorship, dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:taxonRank) can be unambiguously resolved by the specified source authority service; ..." #57: "AMENDED if a value for dwc:taxonID is unique and resolvable on the basis of the value of the lowest ranking not EMPTY taxon classification terms dwc:scientificName, dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.; ..." (and I will change "etc" as this doesn't look good. My inclination is to mirror the "GENUS_NOTFOUND", FAMILY_NOTFOUND", "ORDER_NOTFOUND", "CLASS_NOTFOUND", "KINGDOM_NOTFOUND" with "(VALIDATION)_SCIENTIFICNAME_NOTFOUND" by send whatever is in dwc:scientificName to the bdq:sourceAuthority and don't have an equivalent amendment. I understand that a) it depends on the smarts of the bdq:sourceAuthority (which has to increase quickly) and b) accepting we may get many false positives. But one of the criteria for accepting a high number of false positives is that it highlights a significant issue. I'd still get rid of #46 and #45. |
I am in accord with the conclusions of Lee's final paragraph.
…On Tue, Jul 14, 2020 at 8:59 PM Lee Belbin ***@***.***> wrote:
I tend to agree with @ArthurChapman <https://github.com/ArthurChapman>.
Once we open the Pandora's Box of parsing dwc:scientificName, don't we need
specific rules based upon a vocabulary that can assure us of a high
probability of success? Flagging a potential issue as in the VALIDATION
#46 <#46> is an equal challenge, but a
safer test than this AMENDMENT.
We also have the following tests that seem to me to have similar problems
(as noted by @chicoreus <https://github.com/chicoreus>):
#101 <#101>: "COMPLIANT if the
polynomial, as represented in dwc:scientificName, is consistent with the
atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet;..."
#46 <#46>: "COMPLIANT if there are no
nomenclatural errors (e.g. typographical errors and misspellings) of a
polynomial, as represented in dwc:scientificName according to the
bdq:sourceAuthority service; ..."
#70 <#70>: " COMPLIANT if the
combination of values of dwc:Taxon terms (dwc:scientificName,
dwc:scientificNameAuthorship, dwc:subgenus, dwc:genus, dwc:family,
dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:taxonRank) can be
unambiguously resolved by the specified source authority service; ..."
#57 <#57>: "AMENDED if a value for
dwc:taxonID is unique and resolvable on the basis of the value of the
lowest ranking not EMPTY taxon classification terms dwc:scientificName,
dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.;
..." (and I will change "etc" as this doesn't look good.
My inclination is to mirror the "GENUS_NOTFOUND", FAMILY_NOTFOUND",
"ORDER_NOTFOUND", "CLASS_NOTFOUND", "KINGDOM_NOTFOUND" with
"(VALIDATION)_SCIENTIFICNAME_NOTFOUND" by send whatever is in
dwc:scientificName to the bdq:sourceAuthority and don't have an equivalent
amendment. I understand that a) it depends on the smarts of the
bdq:sourceAuthority (which has to increase quickly) and b) accepting we may
get many false positives. But one of the criteria for accepting a high
number of false positives is that it highlights a significant issue. I'd
still get rid of #46 <#46> and #45
<#45>.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#45 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ726PJDN4AXNI3RYXTEDR3TWONANCNFSM4EKSMKDQ>
.
|
I agreement with the quorum from the email responses on July 15, 2020, this amendment was considered too difficult to implement with confidence, for the present. |
From the discussion, this is still immature and needs substantive further consideration. Removing from supplementary and tagging as immature. Updated the markdown to reflect current practice, added a source authority in current form. Since this was written, dwc:genericName has come into use, so replacing dwc:genus (the classification term) with dwc:genericName (the atomic generic part of the scientific name). Additional terms (dwc:subgenus, dwc:infragenericEpithet, dwc:cultivarEpithet) might be appropriate to include as information elements acted upon. One point for further consideration is if this test should operate on just dwc:scientificName, or if it should operate on that term and all the atomic component terms (dwc:genericName, dwc:specificEpithet, etc). This test might also consider dwc:scientificNameID as an information element consulted. Substantial thought and testing needed to bring this test to maturity. |
@chicoreus - you missed adding "a source authority in current form." |
@ArthurChapman fixed. |
Examples edited to conform with current practice of providing both a pass and fail example. |
Aligned parameters to current template |
Fixed typos/errors in specifications to align with current template |
Standardized reference to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available" in Expected Response. |
The text was updated successfully, but these errors were encountered: