Skip to content
Stan Blum edited this page Nov 26, 2022 · 11 revisions

Welcome to the material-sample wiki! Please feel free to add to and edit pages here.

An important challenge facing primary biodiversity data publishing and integration is the encoding and interpretation of data about:

  1. Organisms
  2. Occurrences, and
  3. Material-samples (all the subtypes)

The Material Sample Task Group aims to clarify how data publishers should encode and aggregators and end users should interpret records about material samples and the occurrences and organisms they infer.

Some simple definitions taken from Wikipedia or dictionaries:

  • Organism: wikipedia an organism (from Greek: organismos) is any organic, living system that functions as an individual entity.

  • Occurrence (defining a biological occurrence): [Not from dictionary or Wikipedia] The existence of an organism at a place and time.
    The only restriction we might put on this is that occurrence could/should be categorized as natural (occurrence in nature) versus human facilitated. This is controversial (both sides of this have been expressed in the discussion). The issue needs resolution, and our definition/documentation should address the issue and guide users explicitly.

  • Material-Sample: wikipedia: a sample is a limited quantity of something [or in BDI usage a group of organisms, a single organism, or a part of an organism, or parts of multiple organisms, including free macro-molecules] that is intended to be similar to and represent a larger amount of that thing(s); i.e., local population(s) of organisms. The things could be countable objects such as individual items available as units for sale, or an uncountable material.

[Can we defer talking about observations (bco:information-artifacts) in this analysis? Is it true that wherever an observation could be considered our scenarios, a material sample could be also collected in a comparable scenario? We can check later whether the absence of a material-sample should change how data are published and interpreted in ways that are not obvious.]

Class - Properties

Where do (should) properties attach to these classes? [For convenience, I've used informal language to denote property concepts rather than DwC terms. We can substitute DwC terms later.]

  • Organism [Org]: Taxonomic identity, Caste (social insects), Sex (invariant for most, but can change over time in some taxa); AgeClass changes over time, so more correctly a property of Occurrence, i.e., the Organism at a point in time.

  • Occurrence [Occ] = [Organism], [Event] : Date, Time, [Locality], Collector(s), Method

    • [Locality] : LocalityDescription, Latitude, Longitude, Country, State-Prov, County, etc.
  • Material-sample [MatSam]: part(s) of organism, preparation (method and materials), storage, disposition, history (loan, exhibit)

Encoding classes represented or inferred

How does the publisher communicate to the aggregator and downstream users what exactly is represented in a record, and if multiple records are published for a given occurrence or organism, which record is the best representation of the thing the aggregator or end user is interested in?

Scenarios

  1. The trivial case occurs with a discipline like herpetology, where an organism is collected and preserved whole, and represented that way in the collection catalog. (Comparable cases occur in virtually every other biodiversity collection discipline.)

    1 organism : 1 occurrence : 1 material-sample

    • [Org] dwc:scientificName, [Occ] dwc:occurrenceID, dwc:eventDate, dwc:recordedBy, dwc:locality, [MatSam] materialSampleType (Preserved Specimen), dwc:preparations: "whole animal (EtOH)"
  2. A salamander (organism) is collected, a tissue sample taken and placed in 95% EtOH and frozen in liquid nitrogen (LN2). There are two material samples, the "whole" specimen and the tissue sample. (Typically the whole specimen is also considered the organism and occurrence; the tissue-sample typically gets another identifier, at least a subnumber, so that it's trackable independently of the whole specimen.)

    1 organism : 1 occurrence : 2 material-samples/perserved-specimens

    Main-Record

    • [Org] dwc:scientificName, [Occ] dwc:occurrenceID, dwc:eventDate, dwc:recordedBy, dwc:locality, [MatSam] materialSampleType (Preserved Specimen), dwc:preparations: "whole animal (EtOH) | tissue (EtOH)"

      Related-Many-Records

      • dwc:occurrenceID, dwc:materialSampleID, dwc:preparations: "whole animal (EtOH)"
      • dwc:occurrenceID, dwc:materialSampleID, dwc:preparations: "tissue (EtOH)", ggbn?:storageRegime: "ultracold"
  3. A mouse is collected, a tissue sample taken and the whole specimen is prepared into a standard skin and skull. This is essentially the same as #1 except that the preparations are obviously not the whole specimen.

    1 organism : 1 occurrence : 3 material-samples

    • [Org] dwc:scientificName, [Occ] dwc:occurrenceID, dwc:eventDate, dwc:recordedBy, dwc:locality, [MatSam] dwc:preparations: "Skull" | "Skin" | tissue (95% EtOH)"*
      • dwc:occurrenceID, dwc:materialSampleID, dwc:preparations: "Skin"
      • dwc:occurrenceID, dwc:materialSampleID, dwc:preparations: "Skull"
      • dwc:occurrenceID, dwc:materialSampleID, dwc:preparations: "tissue sample (95% EtOH in ultracold)"

* Material-sample data can be denormalized for convenience in the [Org]+[Occ] record.

  1. Three cuttings are taken from a shrub, one kept, two distributed to other herbaria

    1 organism : 1 occurrence : 3 material-samples (all with different institution/collection codes and accession numbers)

    The practice in many herbaria is to identify the organism:occurrence by the "CollectorsNumber".

    These would (should) show up in an aggregator as:

dwc:institutionCode dwc:collectionCode dwc:catalogNumber dwc:collectorsNumber dwc:occurrenceID preservedSpecimenID
Inst-1 Bot 19283837 ABC-5678 GUID-1* GUID-3
Inst-2 Botany 23499458 ABC-5678 GUID-1 GUID-4
Inst-3 Bot 65487815 ABC-5678 GUID-1 GUID-5

* Given that GUIDs for dwc:occurrenceID probably aren't assigned (by the CMS) until the specimen is cataloged, the first institution needs to catalog the first duplicate before distributing the others, or needs to send (push) the GUID along after the duplicates are distributed.

  1. The same plant collected repeatedly at different times of year to get leaf-buds, leaves, flowers, fruits
  • 1 organism : 4 occurrences : 4 material-samples

    * the 4 occurrences need to be linked by a common dwc:organismID

  1. An endangered Hawaiian bird is captured, banded, has blood drawn to screen for malaria (or viruses, etc.), and then released. Later it is recaptured, has blood sampled, and released.

    1 organism : 2 occurrences : 2 material-samples [several subsamples could be derived from these, each with subsampling who, when, what, how data]

  2. A water sample is collected and processed using 'omics approaches to assess what species occur at a particular location and date.

    more than 1 organism : more than 1 occurrence : 1 material-sample

  3. [more examples where the cardinalities of organism : occurrence : material-sample are not 1:1:1]