Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-AMENDMENT_POLYNOMIAL_STANDARDIZED #45

Closed
iDigBioBot opened this issue Jan 5, 2018 · 18 comments
Closed

TG2-AMENDMENT_POLYNOMIAL_STANDARDIZED #45

iDigBioBot opened this issue Jan 5, 2018 · 18 comments
Labels
Amendment Conformance Immature/Incomplete A test where substantial work is needed to develop the specification to the point where the test ca NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 VOCABULARY

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 8ab38bee-323c-4926-a7e9-c0417cd3b14d
Label AMENDMENT_POLYNOMIAL_STANDARDIZED
Description Amend the scientific name to correct typographical errors and misspellings according to a specified source authority.
TestType Amendment
Darwin Core Class Taxon
Information Elements ActedUpon dwc:scientificName
dwc:genericName
dwc:specificEpithet
dwc:infraSpecificEpithet
dwc:scientificNameAuthorship
dwc:yearOfPublication
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:scientificName is bdq:Empty; AMENDED (dwc:scientificName, genus, specificEpithet, infraspecificEpithet, scientificNameAuthorship, yearOfPublication) if typographical errors and misspellings represented in dwc:scientificName have been unambiguously interpreted in the bdq:sourceAuthority; otherwise NOT_CHANGED
Data Quality Dimension Conformance
Term-Actions POLYNOMIAL_STANDARDIZED
Parameter(s) bdq:sourceAuthority
Source Authority [bdq:sourceAuthority ](bdq:sourceAuthority default = "GBIF Backbone Taxonomy" {[https://doi.org/10.15468/39omei]} {API endpoint [https://api.gbif.org/v1/species?datasetKey=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&name=]})
Specification Last Updated 2024-04-16
Examples [dwc:scientificName="Acacia longifloia" Response.status=AMENDED, Response.result=dwc:scientificName"Acacia longifolia" Response.comment="dwc:scientificName contains an interpretable value in the bdq:sourceAuthority"]
[dwc:scientificName="Acacia camptophylla": Response.status=NOT_AMENDED, Response.result="", Response.comment="dwc:scientificName does not contain an interpretable value as there are a number of options in the bdq:sourceAuthority"]
Source Tania Laity
References
Example Implementations (Mechanisms)
Link to Specification Source Code
Notes [bdq:sourceAuthority default = GBIF Backbone Taxonomy]. (Currently found at: https://www.gbif.org/en/developer/species). The purpose of this Amendment is to correct errors in spelling and typography only. It is not intended to make changes of a taxonomic nature or to deal with errors or inconsistencies in the format of the Authorship.
@iDigBioBot
Copy link
Collaborator Author

Comment by Paul Morris (@chicoreus) migrated from spreadsheet:
The ability to assert a correction to a scientific name string is almost always restricted to proposed corrections to the authorship portion of the string. Much more effective to supply a link to a taxonID found in a nomenclator or taxonomic authority when an unambigouus match can be found than to attempt to alter the string value found in scientificName. An amendment affecting dwc:scientificNameAuthorship, on the other hand, is highly valuable, as the authorship string tend to be highly variable in construction.

@godfoder godfoder changed the title TG2-AMENDMENT_SCIENTIFICNAME_STANDARDIZED TG2-AMENDMENT_BINOMIAL_STANDARDIZED Jan 18, 2018
@godfoder godfoder changed the title TG2-AMENDMENT_BINOMIAL_STANDARDIZED TG2-AMENDMENT_POLYNOMIAL_STANDARDIZED Jan 18, 2018
@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Jan 18, 2018
@tucotuco tucotuco added the Parameterized Test requires a parameter label Nov 5, 2018
@chicoreus
Copy link
Collaborator

See also #46 seems to be paired and have the same issues (should be AMENDMENT_SCIENTIFICNAME_STANDARDIZED?). See also: #101 which does seem a legitimate "polynomial" test.

@ArthurChapman
Copy link
Collaborator

I have changed the wording of the Notes

FROM: This test is not intended to make alterations of a taxonomic nature. The intent of this test is not to fix errors or inconsistencies in the format of the dwc:scientificNameAuthorship. For the purpose of this amendment, if the genus in the dwc:genus field does not match the genus of the polynomial, the genus of the polynomial takes precedence for standardization.

TO: The purpose of this Amendment is to correct errors in spelling and typography only. It is not intended to make changes of a taxonomic nature or to deal with errors or inconsistencies in the format of the Authorship. For the purpose of this amendment, if the genus in the dwc:genus field does not match the genus of the polynomial, the genus of the polynomial takes precedence for standardization.

@chicoreus
Copy link
Collaborator

@ArthurChapman improvement in expressing an intent, though a problematic one. Also, "Polynomial" is still problematic. there is no dwc:polynomial,. dwc:scientificName can contain either a uninomial or a polynomial, depending on the rank of the identification. A polynomial (with danger, as darwin core defines genus as the current classification of the scientific name, not the generic part of the dwc:scientificName) can be built from dwc;genus plus dwc;specificEpithent plus dwc:infraspecificEpithet if dwc:specificEpithet is populated, but the specification is mute about what is meant by polynomial in the notes, and the specification does not appear to include a need for terms other than dwc:scientificName, with according to the notes, some unspecified magic removing the authorship from consideration in that value.... The specification is currently mute on authorship, so an implementor's presumption would be that what is to be compared is the entire value found in the dwc:scientificName as compared with the best match in the specified source authority. If there is a desire to not include authorship, then there must be an unambigous specification as to how this is to be done (either with a (defined) parser, or removing the value found in dwc:scientificNameAuthorship from the end of the value found in dwc:scientificName, or by using a defined beginning of string only matching method on the source authority side). As currently phrased, the notes still represent magical thinking about the ability to detect which part of dwc:scientificName is the authorship and which parts are not for the wide range of names of all ranks, hybrids, and complex authorship strings under each of the codes, including the presence of initial capital letters in specific, subspecific, and infraspecific epithets in historical names, authorship strings embedded within name strings for hybrids and trinomials and quadranomials, and all sorts of interesting common cases.

@Tasilee
Copy link
Collaborator

Tasilee commented Jul 14, 2020

After a fun discussion with @ArthurChapman, I think this boils down to how I responded to @chicoreus via email: POLYNOMIAL entails parsing on our end, but we assume parsing within the bdq:sourceAuthority as in the case of #57, don't we? My feeling is we remove #46 and #45 because @chicoreus informs us it is complex?

My point is we throw whatever is in dwc:scientificName at bdq:sourceAuthority with #57.

@ArthurChapman
Copy link
Collaborator

The original idea for Tests #45 and #46 was to fix minor spelling errors in the names (i.e. smithi versus smithii, litoralis versus littoralis etc.). This is something that CRIA does very well with its tests. There were other tests that involve the Taxon, TaxonID, and Scientific Name (+others). If we included Authorship and rank (var., ssp.) in these tests, then we are basically making these tests a duplication of other tests we already have (i.e. those dealing with combinations of TAXONID, TAXON and SCIENTIFICNAME). Given that, and the difficulty that @chicoreus mentions with parsing out the polynomial components from dwc:scientificName, etc., I see little value in continuing with these two tests (#45 and #46). I thus suggest that we simplify the process and change these two tests to SUPPLEMENTARY.

@chicoreus
Copy link
Collaborator

An alternative to moving this test to supplementary would be to specify an explicit means of handling the authorship in this test, for example:

change name from amendment polynomial standardized to amendment namestring standardized.

information elements: dwc:scientificName, dwc:scientificNameAuthorship

specification: EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority service was not available; INTERNAL_PREREQUISITES_NOT_MET if either dwc:scientificName or dwc:scientificNameAuthorship is EMPTY; AMENDED if the text string represented in dwc:scientificName with the text string present in dwc:scientificNameAuthorship removed from the end of is not a match for a scientific name string in the bdq:sourceAuthority and it can be unambiguously corrected to the name string of a known scientific name string consisting of the same number of words (here we could specify a maximum string distance for transformation) according to the bdq:sourceAuthority; otherwise NOT_CHANGED

A similar test with consideration of authorship could be included as supplemental.

In the notes, note that #70 identifies whether the specified source authority has an unambiguuous single record for the taxon, including the higher classification and authorship string, that #101 identifies inconsistencies between the scientific name and the atomic fields, and that #57 is the key amendment to propose a taxon id given the textual terms, including authorship.,

@ArthurChapman
Copy link
Collaborator

That might work @chicoreus - it still has the problem of rank (ie. straight trinomial, trinomial with var., ssp., subsp., forma, f., etc.)

@Tasilee
Copy link
Collaborator

Tasilee commented Jul 14, 2020

I tend to agree with @ArthurChapman. Once we open the Pandora's Box of parsing dwc:scientificName, don't we need specific rules based upon a vocabulary that can assure us of a high probability of success? Flagging a potential issue as in the VALIDATION #46 is an equal challenge, but a safer test than this AMENDMENT.

We also have the following tests that seem to me to have similar problems (as noted by @chicoreus):

#101: "COMPLIANT if the polynomial, as represented in dwc:scientificName, is consistent with the atomic parts dwc:genus, dwc:specificEpithet, dwc:infraspecificEpithet;..."

#46: "COMPLIANT if there are no nomenclatural errors (e.g. typographical errors and misspellings) of a polynomial, as represented in dwc:scientificName according to the bdq:sourceAuthority service; ..."

#70: " COMPLIANT if the combination of values of dwc:Taxon terms (dwc:scientificName, dwc:scientificNameAuthorship, dwc:subgenus, dwc:genus, dwc:family, dwc:order, dwc:class, dwc:phylum, dwc:kingdom, dwc:taxonRank) can be unambiguously resolved by the specified source authority service; ..."

#57: "AMENDED if a value for dwc:taxonID is unique and resolvable on the basis of the value of the lowest ranking not EMPTY taxon classification terms dwc:scientificName, dwc:scientificNameAuthorship, dwc:kingdom, dwc:phylum, dwc:class, etc.; ..." (and I will change "etc" as this doesn't look good.

My inclination is to mirror the "GENUS_NOTFOUND", FAMILY_NOTFOUND", "ORDER_NOTFOUND", "CLASS_NOTFOUND", "KINGDOM_NOTFOUND" with "(VALIDATION)_SCIENTIFICNAME_NOTFOUND" by send whatever is in dwc:scientificName to the bdq:sourceAuthority and don't have an equivalent amendment. I understand that a) it depends on the smarts of the bdq:sourceAuthority (which has to increase quickly) and b) accepting we may get many false positives. But one of the criteria for accepting a high number of false positives is that it highlights a significant issue. I'd still get rid of #46 and #45.

@tucotuco
Copy link
Member

tucotuco commented Jul 15, 2020 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 10, 2020

I agreement with the quorum from the email responses on July 15, 2020, this amendment was considered too difficult to implement with confidence, for the present.

@Tasilee Tasilee closed this as completed Aug 10, 2020
@ArthurChapman ArthurChapman added Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. and removed Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT labels Sep 18, 2023
@chicoreus chicoreus added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Sep 18, 2023
@chicoreus chicoreus added Immature/Incomplete A test where substantial work is needed to develop the specification to the point where the test ca and removed Supplementary Tests supplementary to the core test suite. These are tests that the team regarded as not CORE. labels Feb 16, 2024
@chicoreus
Copy link
Collaborator

From the discussion, this is still immature and needs substantive further consideration. Removing from supplementary and tagging as immature.

Updated the markdown to reflect current practice, added a source authority in current form.

Since this was written, dwc:genericName has come into use, so replacing dwc:genus (the classification term) with dwc:genericName (the atomic generic part of the scientific name).

Additional terms (dwc:subgenus, dwc:infragenericEpithet, dwc:cultivarEpithet) might be appropriate to include as information elements acted upon.

One point for further consideration is if this test should operate on just dwc:scientificName, or if it should operate on that term and all the atomic component terms (dwc:genericName, dwc:specificEpithet, etc). This test might also consider dwc:scientificNameID as an information element consulted. Substantial thought and testing needed to bring this test to maturity.

@ArthurChapman
Copy link
Collaborator

@chicoreus - you missed adding "a source authority in current form."

@chicoreus
Copy link
Collaborator

@ArthurChapman fixed.

@ArthurChapman
Copy link
Collaborator

Examples edited to conform with current practice of providing both a pass and fail example.

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 21, 2024

Aligned parameters to current template

@Tasilee
Copy link
Collaborator

Tasilee commented Feb 22, 2024

Fixed typos/errors in specifications to align with current template

@Tasilee
Copy link
Collaborator

Tasilee commented Apr 16, 2024

Standardized reference to "EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available" in Expected Response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Amendment Conformance Immature/Incomplete A test where substantial work is needed to develop the specification to the point where the test ca NAME Parameterized Test requires a parameter Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 VOCABULARY
Projects
None yet
Development

No branches or pull requests

5 participants