-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prefLabel is non-deterministic when multiple rdfs:label are present in the source #164
Comments
I think that is the behavior I would expect. There are 3 labels and no prefLabel in the ontology shown here, and AFAIK there can be no enforced ordering in SPARQL when the triples are requested, so whichever one comes back first is Preferred. (We have to pick one and we can't pick more than one.) Since they haven't specified a specific preferredLabel neither can we. We could choose in this case to always provide the alphabetically first label, the longest label, or the longest label that has the alphabetically first label in it (alphabetical order breaking ties in options 2 or 3). That should make the label consistent and most inclusive, I think option 2 is best. Another option in some cases: In the example you provided on slack (below), there is a label specified by the property obo:IAO_0000589, which is "An alternative name for a class or property which is unique across the OBO Foundry." We could use this class when present to choose a label, although it is not one of the original labels in the ontology; it has the advantage that it is a singular label (so, consistent from one parsing to the next) and in OBO world makes the label unique across OBO (apparently).
|
The reason behind this is in the way prefLabel is handled when no skos:prefLabel found. So yeah, it's random and dependent on the order of labels that the triple store returns. |
+1 |
yes, my understanding is that list order returned from SPARQL queries are by definition undefined unless a sorting mechanism is explicitly specified in the query. Not a BioPortal limitation. :-) |
The best scenario to me would be to inform the ontology developper that he is not following a good practice by not informing on a preferred label property. Personnaly, I would implement the system so that the pref label in BioPortal skipped every label if no pref one is informed (if there are multiple labels of course) and then the automatic rollback already in place will pickup the end of the URI as pref label. I don't think its good that the technology always addresses the lack of design... BioPortal needs a pref label to offer its full service. Let's brand this and assume the fact that resources that are not especially well designed will be handled less well. |
I agree that it's not ontoportal's responsibility to fix problems with ontologies. Ideally, it should be handled with other ontology linting/validating tools if they exist; however, the issue I would like to address is that every time the same ontology submission gets processed, ontoportal displays it differently. This complicates troubleshooting, migration, and development efforts. |
I do not consider it a "problem" that requires fixing that the provider has not followed a good practice. We are in no position to bird-dog our ontology providers that closely. However, the system does require that we have a prefLabel for every term, because we use the prefLabel as, you know, the preferred label. Arguably it's bad form for us to make that publicly visible in the way we do, it doesn't really reflect that this is the BP-designated prefLabel and not the ontology author's designation. So that's something we should fix someday (it's apparent on examination of the TTL file, because we use our local prefLabel property not the standard one). So given that we have to make a choice, let's make the best choice we can, which I think is to choose a label for the prefLabel in a way that we will get the same label every time as long as the author doesn't add a new choice or eliminate our choice. The way to do this most usefully is to take the most detailed label offered (that's the longest, and therefore most likely to include terms in the search list), and if there is more than one of maxLength, pick the alphabetically first one among those. (I like the idea of the OBO one but am not recommending it because the OBO one may be unattractive in some cases, and the ontology owner may not like it. If it were possible I'd say add a tool-tip to that PrefLabel title that says "Author-specified prefLabel, or if not specified, longest available label" |
Added an array
This isn't a perfect solution, but it adds some determinism in selecting the prefLabel |
Does this mean it will be sorted alphanumerically by the rdfs:label literal?
|
This is a short-term fix to implement a deterministic order of selecting a prefLabel during the ontology processing stage. A more complete support for language-based prefLabel(s) is in the works. |
We have discrepancies in the way perflabel is generated when multiple rdfs:label entries are present such that in the staging environment one label is chosen and in production, a different label is chosen.
For example, in UPHENO ontology term ID
http://purl.obolibrary.org/obo/FBbt_00000002
Preferred name in production is 'Drosophila tagma' but in staging it is 'dagma'The text was updated successfully, but these errors were encountered: