prefLabel is non-deterministic when multiple rdfs:label are present in the source #164

alexskr · 2022-12-01T21:08:38Z

We have discrepancies in the way perflabel is generated when multiple rdfs:label entries are present such that in the staging environment one label is chosen and in production, a different label is chosen.

For example, in UPHENO ontology term ID http://purl.obolibrary.org/obo/FBbt_00000002 Preferred name in production is 'Drosophila tagma' but in staging it is 'dagma'

The text was updated successfully, but these errors were encountered:

alexskr · 2022-12-01T22:36:27Z

`www.semanticweb.org/rbmor/ontologies/2021/1/1/untitled-ontology-133#Agitation term ID sometimes is named as "Agitation" and sometimes as "Restlessness"

graybeal · 2022-12-05T04:19:02Z

I think that is the behavior I would expect. There are 3 labels and no prefLabel in the ontology shown here, and AFAIK there can be no enforced ordering in SPARQL when the triples are requested, so whichever one comes back first is Preferred. (We have to pick one and we can't pick more than one.) Since they haven't specified a specific preferredLabel neither can we.

We could choose in this case to always provide the alphabetically first label, the longest label, or the longest label that has the alphabetically first label in it (alphabetical order breaking ties in options 2 or 3). That should make the label consistent and most inclusive, I think option 2 is best.

Another option in some cases: In the example you provided on slack (below), there is a label specified by the property obo:IAO_0000589, which is "An alternative name for a class or property which is unique across the OBO Foundry." We could use this class when present to choose a label, although it is not one of the original labels in the ontology; it has the advantage that it is a singular label (so, consistent from one parsing to the next) and in OBO world makes the label unique across OBO (apparently).


    <!-- http://purl.obolibrary.org/obo/FBbt_00000002 -->

    <owl:Class rdf:about="http://purl.obolibrary.org/obo/FBbt_00000002">
        <owl:equivalentClass>
            <owl:Class>
                <owl:intersectionOf rdf:parseType="Collection">
                    <rdf:Description rdf:about="http://purl.obolibrary.org/obo/UBERON_6000002"/>
                    <owl:Restriction>
                        <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
                        <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_7227"/>
                    </owl:Restriction>
                </owl:intersectionOf>
            </owl:Class>
        </owl:equivalentClass>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/FBbt_00057001"/>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/UBERON_6000002"/>
        <rdfs:subClassOf>
            <owl:Restriction>
                <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/BFO_0000050"/>
                <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/FBbt_00000001"/>
            </owl:Restriction>
        </rdfs:subClassOf>
        <metadata:prefixIRI rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FBbt:00000002</metadata:prefixIRI>
        <metadata:treeView rdf:resource="http://purl.obolibrary.org/obo/FBbt_00000001"/>
        <obo4:part_of rdf:resource="http://purl.obolibrary.org/obo/FBbt_00000001"/>
        <obo:IAO_0000115>The three main divisions of the whole organism formed from groups of segments.</obo:IAO_0000115>
        <obo:IAO_0000589 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">arthropod tagma (drosophila)</obo:IAO_0000589>
        <oboInOwl:hasDbXref>UBERON:6000002</oboInOwl:hasDbXref>
        <oboInOwl:hasOBONamespace>fly_anatomy.ontology</oboInOwl:hasOBONamespace>
        <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FBbt:00000002</oboInOwl:id>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/fbbt#FB_gloss"/>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/fbbt#cur"/>
        <oboInOwl:inSubset rdf:resource="http://purl.obolibrary.org/obo/fbbt#larval_OF"/>
        <rdfs:label>Drosophila tagma</rdfs:label>
        <rdfs:label>tagma</rdfs:label>
        <skos:notation rdf:datatype="http://www.w3.org/2001/XMLSchema#string">FBbt:00000002</skos:notation>
    </owl:Class>
    ```

syphax-bouazzouni · 2022-12-05T07:43:53Z

The reason behind this is in the way prefLabel is handled when no skos:prefLabel found.
We put the first rdfs:label found as the prefLabel (see https://github.com/ncbo/ontologies_linked_data/blob/master/lib/ontologies_linked_data/models/ontology_submission.rb#L678)

So yeah, it's random and dependent on the order of labels that the triple store returns.
What it's strange in your data is that the returned labels (see label property below) order is not the same that the one in the triple store (see property "http://www.w3.org/2000/01/rdf-schema#label" below) (maybe the order is random and change at each request (when no cache))

syphax-bouazzouni · 2022-12-05T07:48:00Z

We could choose in this case to always provide the alphabetically first label, the longest label, or the longest label that has the alphabetically first label in it

+1

graybeal · 2022-12-05T18:13:13Z

(maybe the order is random and change at each request (when no cache))

yes, my understanding is that list order returned from SPARQL queries are by definition undefined unless a sorting mechanism is explicitly specified in the query. Not a BioPortal limitation. :-)

jonquet · 2022-12-06T11:32:48Z

The best scenario to me would be to inform the ontology developper that he is not following a good practice by not informing on a preferred label property.
I would not vote in preference of an alphabetical selection, as it would "show" we have done something to cope with the ontology developer bad practice.

Personnaly, I would implement the system so that the pref label in BioPortal skipped every label if no pref one is informed (if there are multiple labels of course) and then the automatic rollback already in place will pickup the end of the URI as pref label.
FBbt_00000002 in our exemple
It would not take long to the ontology developer to learn how to declare a preferred label i his/her ontology.

I don't think its good that the technology always addresses the lack of design... BioPortal needs a pref label to offer its full service. Let's brand this and assume the fact that resources that are not especially well designed will be handled less well.

alexskr · 2022-12-06T19:41:35Z

I agree that it's not ontoportal's responsibility to fix problems with ontologies. Ideally, it should be handled with other ontology linting/validating tools if they exist; however, the issue I would like to address is that every time the same ontology submission gets processed, ontoportal displays it differently. This complicates troubleshooting, migration, and development efforts.

graybeal · 2022-12-07T03:33:05Z

I do not consider it a "problem" that requires fixing that the provider has not followed a good practice. We are in no position to bird-dog our ontology providers that closely.

However, the system does require that we have a prefLabel for every term, because we use the prefLabel as, you know, the preferred label. Arguably it's bad form for us to make that publicly visible in the way we do, it doesn't really reflect that this is the BP-designated prefLabel and not the ontology author's designation. So that's something we should fix someday (it's apparent on examination of the TTL file, because we use our local prefLabel property not the standard one).

So given that we have to make a choice, let's make the best choice we can, which I think is to choose a label for the prefLabel in a way that we will get the same label every time as long as the author doesn't add a new choice or eliminate our choice. The way to do this most usefully is to take the most detailed label offered (that's the longest, and therefore most likely to include terms in the search list), and if there is more than one of maxLength, pick the alphabetically first one among those.

(I like the idea of the OBO one but am not recommending it because the OBO one may be unattractive in some cases, and the ontology owner may not like it.

If it were possible I'd say add a tool-tip to that PrefLabel title that says "Author-specified prefLabel, or if not specified, longest available label"

mdorf · 2023-03-01T21:30:18Z

Added an array .sort call for cases with multiple labels:

            label = rdfs_labels.sort[0]

This isn't a perfect solution, but it adds some determinism in selecting the prefLabel
https://github.com/ncbo/ontologies_linked_data/blob/master/lib/ontologies_linked_data/models/ontology_submission.rb#L687

matuskalas · 2023-03-01T21:58:34Z

Does this mean it will be sorted alphanumerically by the rdfs:label literal?
Would it then also mean that the language is not taken into account?
Or is there a proper handling of languages with the fallback order of something like the following?

1. @en-us
2. @en
3. no xml:lang tag

mdorf · 2023-03-01T22:20:03Z

Does this mean it will be sorted alphanumerically by the rdfs:label literal? Would it then also mean that the language is not taken into account? Or is there a proper handling of languages with the fallback order of something like the following?
1. @en-us
2. @en
3. no xml:lang tag

This is a short-term fix to implement a deterministic order of selecting a prefLabel during the ontology processing stage. A more complete support for language-based prefLabel(s) is in the works.

graybeal changed the title ~~perfLabel is non-deterministic when multiple rdfs:label are present in the source~~ prefLabel is non-deterministic when multiple rdfs:label are present in the source Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prefLabel is non-deterministic when multiple rdfs:label are present in the source #164

prefLabel is non-deterministic when multiple rdfs:label are present in the source #164

alexskr commented Dec 1, 2022

alexskr commented Dec 1, 2022

graybeal commented Dec 5, 2022

syphax-bouazzouni commented Dec 5, 2022

syphax-bouazzouni commented Dec 5, 2022

graybeal commented Dec 5, 2022

jonquet commented Dec 6, 2022

alexskr commented Dec 6, 2022

graybeal commented Dec 7, 2022

mdorf commented Mar 1, 2023 •

edited

Loading

matuskalas commented Mar 1, 2023

mdorf commented Mar 1, 2023

prefLabel is non-deterministic when multiple rdfs:label are present in the source #164

prefLabel is non-deterministic when multiple rdfs:label are present in the source #164

Comments

alexskr commented Dec 1, 2022

alexskr commented Dec 1, 2022

graybeal commented Dec 5, 2022

syphax-bouazzouni commented Dec 5, 2022

syphax-bouazzouni commented Dec 5, 2022

graybeal commented Dec 5, 2022

jonquet commented Dec 6, 2022

alexskr commented Dec 6, 2022

graybeal commented Dec 7, 2022

mdorf commented Mar 1, 2023 • edited Loading

matuskalas commented Mar 1, 2023

mdorf commented Mar 1, 2023

mdorf commented Mar 1, 2023 •

edited

Loading