Skip to content

Commit

Permalink
Some Rmd formatting and heading changes, along with a full render u…
Browse files Browse the repository at this point in the history
…sing `rmarkdown::render`. The main thing was folding the tables under a seperate heading so they could be skipped, and adding the **Scraping** and **Triggers** headings, which need to be filled in, obviously.
  • Loading branch information
SimonGoring committed Dec 27, 2016
1 parent 5dbdc8e commit 99fe9cf
Show file tree
Hide file tree
Showing 2 changed files with 1,242 additions and 920 deletions.
43 changes: 24 additions & 19 deletions DarwinCoreMapping.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ output:
number_sections: no
theme: journal
toc: yes
toc_float: true
toc_depth: 3
---

Expand Down Expand Up @@ -40,28 +41,28 @@ table tr th :last-child, table tr td :last-child {
margin-bottom: 0; }
</style>

# DwC Alignment:
# Introduction

The Neotoma Paleoecological database represents a rich source of data to researchers and the public who may be interested in biodiversity, biogeography and temporal and spatial ecology at long time scales. The data crosses multiple allied disciplines, from archaeology, geography, biology, paleontology, ecology and, primarily, paleoecology. The breadth of utility for this data is potentially hampered by the reliance on a single source of access for the data.

To broaden the availability of data from Netoma, we are working with the Global Biodiversity Information Facility (GBIF). GBIF is an international organization that partners with a number of organizations to make biodiversity data available online. The use of clear metadata standards, like DarwinCore, make GBIF a useful resource, and the increased size of the data holdings further improve its utility for researchers working on global biodiversity research.

There are three key steps in making Neotoma data available through GBIF. The first is mapping Neotoma data holdings and concepts onto the metadata standards within GBIF. This current document is based on [key metadata terms provided by DarwinCore](http://rs.tdwg.org/dwc/terms/), and the ways in which the can represent Neotoma data, across dataset types. In cases where the current implementation of DarwinCore (or other existing metadata standards such as DublinCore) fails to represent important concepts within Neotoma, we describe new terms that can be used to supplement existing metadata.
There are three key steps in making Neotoma data available through GBIF. The first is mapping Neotoma data holdings and concepts onto the metadata standards within GBIF. This current document is based on [key metadata terms provided by DarwinCore](http://rs.tdwg.org/dwc/terms/), and the ways in which they can represent Neotoma data, across dataset types. In cases where the current implementation of DarwinCore (or other existing metadata standards such as DublinCore) fails to represent important concepts within Neotoma, we describe new terms that can be used to supplement existing metadata.

The second step is simply scraping the Neotoma Database, both at the dataset level for the initial push, but then secondarily, keeping GBIF up-to-date with Neotoma holdings.

<object data="images/DwC_Upload.svg" type="image/svg+xml">
<img src="images/DwC_Upload.png" />
<object data="images/DwC_Upload.svg" type="image/svg+xml" width="600">
<img src="images/DwC_Upload.png" width="600"/>
</object>

To then undertake the mapping of the records, we build a `csv` file for upload, from each individual record within Neotoma using the following mapping:


# Fully Described Output
# Mapping DarwinCore to Neotoma

This is the direct mapping used to generate the Neotoma export to VertNet/DarwinCore. In places the tables and documents refer to the mapping of various DarwinCore terms to tables and fields within the Neotoma Database. Neotoma is a SQL Server 2014 database, the format used to refer to specific fields is `XXX.yyy:zzz`, where `XXX` refers to the specific database, in most cases `NDB` for the Neotoma Database. `yyy` refers to the tables within the Neotoma Database, while `zzz` refers to the specific field within the table. There is more detail on the related tables within [the online Neotoma Database manual](neotoma-manual.readthedocs.org/en/latest/).

## Direct Mapping
## Final Alignment

** Note** - This section is still in progress.

Expand Down Expand Up @@ -131,9 +132,9 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| **country** | |
| **stateProvince** | |

# All Possible DarwinCore Fields
## All DarwinCore Fields - Reference

## Record Level Terms
### Record Level Terms

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -158,7 +159,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | dataGeneralizations | Actions taken to make the shared data less specific or complete than in its original form. Suggests that alternative data of higher quality may be available on request. |
| * | Age terms | dynamicProperties | A list of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content. JSON structured: "{"thing":value}" |

## Organism
### Occurrence

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -184,7 +185,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | otherCatalogNumbers | A list (concatenated and separated) of previous or alternate fully qualified catalog numbers or other human-used identifiers for the same Occurrence, whether in the current or any other data set or collection. |
| | | occurrenceRemarks | Comments or notes about the Occurrence. |

## Organism
### Organism

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -196,13 +197,13 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | previousIdentifications | A list (concatenated and separated) of previous assignments of names to the Organism. |
| | | organismRemarks | Comments or notes about the Organism instance. |

## MaterialSample, LivingSpecimen, PreservedSpecimen, FossilSpecimen
### MaterialSample, LivingSpecimen, PreservedSpecimen, FossilSpecimen

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
| | IGSN | materialSampleID | An identifier for the MaterialSample (as opposed to a particular digital record of the material sample). In the absence of a persistent global unique identifier, construct one from a combination of identifiers in the record that will most closely make the materialSampleID globally unique. |

## Event, HumanObservation, MachineObservation
### Event, HumanObservation, MachineObservation

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -225,7 +226,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | fieldNotes | One of a) an indicator of the existence of, b) a reference to (publication, URI), or c) the text of notes taken in the field about the Event. |
| * | `CollectionUnits:Notes` | eventRemarks | Comments or notes about the Event. |

## Location
### Location

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand Down Expand Up @@ -274,7 +275,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | georeferenceVerificationStatus | |
| | | georeferenceRemarks | |

## GeologicalContext
### GeologicalContext

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -297,7 +298,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | member | |
| | | bed | |

## Identification
### Identification

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -310,7 +311,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | identificationVerificationStatus | A categorical indicator of the extent to which the taxonomic identification has been verified to be correct. Recommended best practice is to use a controlled vocabulary such as that used in HISPID/ABCD. |
| | | identificationRemarks | Comments or notes about the Identification. |

## Taxon
### Taxon

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand Down Expand Up @@ -348,8 +349,8 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | nomenclaturalStatus | The status related to the original publication of the name and its conformance to the relevant rules of nomenclature. It is based essentially on an algorithm according to the business rules of the code. It requires no taxonomic opinion. |
| | | taxonRemarks | Comments or notes about the taxon or name. |

## Auxiliary Terms
### MeasurementOrFact
### Auxiliary Terms
#### MeasurementOrFact

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -363,7 +364,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | measurementMethod | | |
| | measurementRemarks| | |

### ResourceRelationship
#### ResourceRelationship

| Implemented | Neotoma | DarwinCore | Description |
| ----------- | ------- | ---------- | ---------- |
Expand All @@ -374,3 +375,7 @@ This is the direct mapping used to generate the Neotoma export to VertNet/Darwin
| | | relationshipAccordingTo | |
| | | relationshipEstablishedDate | |
| | | relationshipRemarks| Comments or notes about the relationship between the two resources. |

# Scraping Neotoma

# Setting Triggers and Data Upload
2,119 changes: 1,218 additions & 901 deletions DarwinCoreMapping.html

Large diffs are not rendered by default.

0 comments on commit 99fe9cf

Please sign in to comment.