-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend basisOfRecord vocabulary #84
Comments
Thanks @adam-collins I've asked @tucotuco to comment on this from a Darwin Core perspective. Some of the proposals in here mix up the original intention for There is a related thread here proposing to bring in richer dataset categories, which would be carried over to occurrences to aid in filtering. That approach was proposed to avoid breaking BOR for others (e.g. those who rely on it to infer the class definitions). My own feeling is that is the better way to accommodate the intention I assume is behind this request.
|
For NomenclaturalChecklist and RegionalChecklist GBIF uses the DatasetSubtype vocabulary. We do not apply BoR to Taxon records, even though DwC places BoR on record level and thus allows to do so. |
From the DwC perspective, BoR was originally meant to designate which of the Darwin Core classes was the primary perspective upon which a view was based - the one in a one-to-many relationship between csv-encoded tables. The primary view of interest was with Occurrence as the core because specimens were the first record type shared with the proto-Darwin Core. Nevertheless it was anticipated that the view could be "inverted" (e.g., with Event-centered Occurrences with Event as a Core) or partial (e.g,, a gazetteer with Location as the Core, or a nomenclatural checklist with Taxon as the core). In order to distinguish Occurrences where vouchers existed from those where they didn't, subtypes of Occurrence (PreservedSpecimen, FossilSpecimen, and LivingSpecimen) were created. At the same time it seemed useful to also have subtypes for the evidence that remained from observation-based Occurrences. These already existed and were borrowed from the Dublin Core type vocabulary (StillImage, MovingImage, Sound). as concrete subtypes of an abstract Observation class, and all in a Darwin Core type vocabulary with namesespace dwctype: as a formal controlled vocabulary for basisOfRecord. All of these were part of a formal vocabulary similar to the DCMI vocabulary for dc:type. One of the problems with the type vocabulary was that we were incorrectly mixing the type vocabularies for dctype: and dwctype:. A second problem was the recognition from the outset that there would be community pressure to diversify the basisOfRecord values for ever more specific categories. We remain with yet a third problem, which is the tendency to use the basisOfRecord as the "Evidence" for Occurrences, when most of the time the evidence falls into many categories. To avoid the first of the issues above and and pave the way for a solution to the second problem, the Darwin Core type vocabulary was deprecated and classes were created in the dwc: namespace for those that didn't already exist. dc:type and dwc:basisOfRecord were both included in the Darwin Core list of terms, where dc:type was properly controlled by dctype: classes and basisOfRecord was like other terms in Darwin Core insofar as it now had the recommendation to use a controlled vocabulary, specifically consisting of the Darwin Core classes, also reflected in the Examples given. So, now we have dc:type to contain the Dublin Core type values (StillImage, MovingImage, Sound, PhysicalObject, Event, and Text) and dwc:basisOfRecord to contain the Darwin Core type values (PreservedSpecimen, FossilSpecimen, LivingSpecimen, MaterialSample, Event, HumanObservation, MachineObservation, Taxon, and Occurrence). No one has yet done anything where the basisOfRecord is any of the other existing Darwin Core classes, though a new class MaterialCitation is on the verge of being born. Because basisOfRecord is not actually controlled by a controlled vocabulary as it once was (it is only a recommendation), the door is open for using other values. However, I would urge against doing so without good reason, and without creating new terms for the same thing, and without ultimately adding them to the standard. Here is how I would map the terms across all four categories. I hope this helps somehow.
[1] As @mdoering said, The Taxon Core definition does not include a basisOfRecord term, the type is designated in the dataset metadata. |
Thank you for clarifying the expectations based on DwC @tucotuco. Where does this leave you @adam-collins, please? I assume you have these BORs to allow users to filter in/out data from e.g. eDNA studies which would not be accommodated by strictly following DwC, also why I started the dataset categorization thread. Are the ALA already committed to the BOR you have please? (e.g. for backwards compatible reasons) |
At this stage we are evaluating changing some of our datasets (eDNA) to be accommodated within the existing fields and vocabularies, pretty much the table that tucotuco suggests above subject to the ability to search/filter. @peggynewman is working out a possible mapping strategy for it. Our preference is not to add backwards compatibility as that will deviate from already provided GBIF pipelines implementation. |
Chiming in with a couple of points specifically on eDNA
Mapping to materialSample still creates issues for ALA - user perspective is:
I think that occurrence data derived from eDNA sampling may massively increase in the next few years. I also know that there are plenty of other discussions going on about BasisOfRecord. I would just like to advocate for users here and emphasise that any of the processing that's so far been proposed will make it more difficult for users to find the data they want, not easier. |
If |
Thanks, @elywallis - we'll tackle the first request and you can watch progress here I also feel we need to focus effort on those looking to extract subsets of data from indexes like GBIF and ALA. My worry is that today you need to understand the idiosyncrasies across several terms, while I suspect most just want a set of checkboxes to filter in/out data at broad categories (e.g. has preserved evidence and originates from an eDNA study). We also need to provide general metrics at that level. Could you please look at the dataset categories here to see if this aligns with the use cases you see, noting that these would be multivalue options? Dataset category is one way we might approach this, but we could also look to other options. |
Thanks @dagendresen. I don't disagree with this but I do have two thoughts
My own feeling is that any change to BOR is really a band-aid to the real problem, which is that we force everything through an occurrence/event model in a star schema. This is why I think focusing on the nature of datasets is worthwhile, since it is orthogonal to whatever happens in DwC and allows us to focus on the filtering needs. It's also why we're starting to consider more expressive models in GBIF (more on that shortly). |
Thanks @timrobertson100 I imagine a possible MaterialSample Core (or similar), and that ... and thus that a new filter category label for |
Plus a few noticed from the current data. See gbif/pipelines#538 See gbif/gbif-api#84
There has been a MaterialSample "core" extension (
https://tools.gbif.org/dwca-validator/extension.do?id=dwc:MaterialSample)
since 2014 when the term was minted, but it is just a copy of the
Occurrence Core from that time and has never been advanced from the sandbox.
…On Thu, Jun 3, 2021 at 5:16 AM Dag Endresen ***@***.***> wrote:
Thanks @timrobertson100 <https://github.com/timrobertson100>
I imagine a possible MaterialSample Core
<https://rs.gbif.org/sandbox/core/dwc_material_sample.xml> (or similar),
and that dwc:PreservedSpecimen (etc) and eDNA samples to maybe become
organized as distinct "things" under dwc:MaterialSample
... and thus that a new filter category label for eDNA samples might
perhaps want to avoid using the label from dwc:MaterialSample in the risk
that this might aquire a superclass meaning...??
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#84 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQ725X4UGMFPA4N5ZO7ZLTQ425JANCNFSM452AGE7Q>
.
|
@tucotuco I do not think that a MaterialSample "extension" will be a full solution. I think that a MaterialSample "core" (in a DwC-A format) would be needed... (Specimens are NOT Occurrences) Maybe this is the same as you are saying? |
@dagendresen I understand and agree. There is confusion in terminology involved. That "extension" is for a MaterialSample Core. You can see that the Occurrence, Event, and Taxon Cores are all called extensions as well (https://tools.gbif.org/dwca-validator/extensions.do). |
Please consider the following additions to basisOfRecord vocabulary.
Can I get clarification on
gbif-api-0.49
basisOfRecord termsUnknown
andObservation
. I read them as equivalent in the document https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml. Is there a relationship between these and the dwc:basisOfRecord exampleOccurrence
?Reference
https://rs.gbif.org/vocabulary/dwc/basis_of_record.xml
http://rs.tdwg.org/dwc/terms/basisOfRecord
https://support.ala.org.au/support/solutions/articles/6000197141-what-is-the-basis-of-record-
The text was updated successfully, but these errors were encountered: