Skip to content

Commit

Permalink
Merge pull request #186 from gaiaresources/BDRSPS-415b
Browse files Browse the repository at this point in the history
BDRSPS-415b: Documentation proofreading
  • Loading branch information
joecrowleygaia authored Aug 21, 2024
2 parents 1549307 + 8577b86 commit a2da834
Show file tree
Hide file tree
Showing 6 changed files with 101 additions and 99 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,6 @@ for geodeticDatum), but new terms may be submitted for consideration amd will no
validation error.

### FILE FORMAT

- The incidental occurrence data template is a [UTF-8](#appendix-iii-utf-8) encoded csv (not Microsoft
Excel Spreadsheets). Be sure to save this file with your data as a .csv (UTF-8) as follows,
otherwise it will not pass the csv validation step upon upload.
Expand Down
85 changes: 42 additions & 43 deletions abis_mapping/templates/survey_metadata/templates/instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,30 @@
{% block body %}
# SYSTEMATIC SURVEY METADATA TEMPLATE INSTRUCTIONS

## OVERVIEW
Use this template to record systematic survey metadata; that is metadata relating to a
Systematic Survey dataset.
## Intended Usage
Use the Systematic Survey Metadata template to record metadata relating to a Systematic Survey dataset.

This Systematic Survey Metadata template must be used in combination with the Systematic
Survey Occurrence template and, in some cases, the Systematic Survey Site template.
The Systematic Survey Metadata template must be used in combination with the Systematic
Survey Occurrence template, and in some cases the Systematic Survey Site template.

Templates have been provided to facilitate integration of your data into the Biodiversity
Data Repository database. Not all types of data have been catered for in the available
templates at this stage; therefore, if you are unable to find a suitable template, please
contact <[email protected]> to make us aware of your data needs.

#### NEED TO KNOW:
#### Data Validation Requirements:
For data validation, you will need your data file to:

- be the correct **file format**,
- have **matching template fields** to the template downloaded (do not remove, or
change the order of fields),
- additional fields may be added **after the templated fields** (noting that the data type
is not assumed and values will be encoded as strings),
- only one row of metadata should be included and only the first row of metadata
**will be accepted** (this symbolises one Survey per dataset submission).
- have values in **mandatory fields** (see Table 1), and
- comply with data **value constraints,** for example the geographic coordinates are
consistent with a geodeticDatum type of the {{values.geodetic_datum_count}} available options.
- have **fields that match the template downloaded** (do not remove, or
change the order of fields),
- have extant values for **mandatory fields** (see Table 1), and
- comply with all **data value constraints**; for example the geographic coordinates are
consistent with a [geodeticDatum](#geodeticDatum-vocabularies) type of the ***{{values.geodetic_datum_count}}*** available
options.
Additional fields may be added **after the templated fields** (noting that the data type
is not assumed and values will be encoded as strings).

### FILE FORMAT

- The systematic survey metadata template is a [UTF-8](#appendix-iv-utf-8) encoded csv (not Microsoft
Excel Spreadsheets). Be sure to save this file with your data as a .csv (UTF-8) as
follows, otherwise it will not pass the in-browser csv validation step upon upload.
Expand All @@ -39,9 +35,12 @@ Unicode (UTF-8)]`

#### FILE SIZE
MS Excel imposes a limit of 1,048,576 rows on a spreadsheet, limiting a CSV file to the
header row followed by 1,048,575 occurrences. Furthermore, MS Excel has a 32,767
character limit on individual cells in a spreadsheet. These limits may be overcome by using
or editing CSV files with other software.
header row followed by 1,048,575 occurrences. Furthermore, MS Excel has a 32,767 character
limit on individual cells in a spreadsheet. These limits may be overcome by using or
editing CSV files with other software.

Larger datasets may be more readily ingested using the API interface. Please contact
<[email protected]> to make us aware of your data needs.

## TEMPLATE FIELDS
The template file contains the field names in the top row that form part of the core Survey
Expand All @@ -61,8 +60,8 @@ Table 2 in alphabetical order of field name.

### ADDITIONAL FIELDS
Data that do not match the existing template fields may be added as additional columns in
the CSV files after the templated fields.
E.g., sampleSizeUnit, sampleSizeValue.
the CSV files after the templated fields.
For example, `sampleSizeUnit`, `sampleSizeValue`.

<ins>Table 1: Systematic Survey Metadata template fields with descriptions, conditions,
datatype format, and examples.</ins>
Expand All @@ -71,26 +70,28 @@ datatype format, and examples.</ins>

## APPENDICES
### APPENDIX-I: VOCABULARY LIST
Apart from geodeticDatum, the data validation does not require adherence to the below vocabularies
for each of the fields indicated as having vocabularies. These vocabularies are provided as a
With the exception of `geodeticDatum`, data validation
does not require adherence to the vocabularies for the various vocabularied fields.. These vocabularies are provided as a
means of assistance in developing consistent language within the database. New terms can be added
to more appropriately describe your data that goes beyond the current list. Table 2 provides some
to more appropriately describe your data that goes beyond the current list.

Table 2 provides some
suggested values from existing sources such as: [Biodiversity Information Standard (TDWG)](https://dwc.tdwg.org/),
[EPSG.io Coordinate systems worldwide](https://epsg.io/), the [Global Biodiversity Information
System](https://rs.gbif.org/), and [Open Nomenclature in the biodiversity
era](https://doi.org/10.1111/2041-210X.12594).

<ins>Table 2: Suggested values for the controlled vocabulary fields in the template. Each term has
a preferred label with a definition to aid understanding of its meaning. For some terms, alternative
labels are provided that mean the same sort of thing. Note: <font color="red">geodeticDatum value
labels with similar semantics are provided. <br>Note: <font color="red">the value for `geodeticDatum`
must come from one of five options in this table.</font></ins>

{{tables.vocabularies}}

### APPENDIX-II: Well Known Text (WKT)
For general information on how WKT coordinate reference data is formatted [here](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry).
The length of a WKT string or of its components is not prescribed. However, MS Excel has a 32,767 (32K) character limit
on individual cells in a spreadsheet.
For general information on how WKT coordinate reference data is formatted is available [here](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry).
The length of a WKT string or of its components is not prescribed; however, MS Excel *does* has a
32,767 (32K) character limit on individual cells in a spreadsheet.

It is possible to edit CSV files outside of Excel in order to include more than 32K characters.

Expand All @@ -109,23 +110,22 @@ Following date and date-time formats are acceptable within the timestamp:
| **xsd:gYear** | yyyy |

Where:<br/>
&emsp; yyyy = four-digit year <br/>
&emsp; mm = two-digit month (01=January, etc.) <br/>
&emsp; dd = two-digit day of month (01 through 31) <br/>
&emsp; hh = two digits of hour (00 through 23) (am/pm NOT allowed) <br/>
&emsp; mm = two digits of minute (00 through 59) <br/>
&emsp; ss = two digits of second (00 through 59) <br/>

&emsp; `yyyy`: four-digit year <br/>
&emsp; `mm`: two-digit month (01=January, etc.) <br/>
&emsp; `dd`: two-digit day of month (01 through 31) <br/>
&emsp; `hh`: two digits of hour (00 through 23) (am/pm NOT allowed) <br/>
&emsp; `mm`: two digits of minute (00 through 59) <br/>
&emsp; `ss`: two digits of second (00 through 59) <br/>

### APPENDIX-IV: UTF-8
UTF-8 encoding is considered a best practice for handling character encoding, especially in
the context of web development, data exchange, and modern software systems. UTF-8
(Unicode Transformation Format, 8-bit) is a variable-width character encoding capable of
encoding all possible characters (code points) in Unicode.<br/>
Here are some reasons why UTF-8 is recommended:**Universal Character Support:** UTF-8
can represent almost all characters from all writing systems in use today. This includes
characters from various languages, mathematical symbols, and other special characters.

Here are some reasons why UTF-8 is recommended:
- **Universal Character Support:** UTF-8 can represent almost all characters from all writing
systems in use today. This includes characters from various languages, mathematical symbols,
and other special characters.
- **Backward Compatibility:** UTF-8 is backward compatible with ASCII (American
Standard Code for Information Interchange). The first 128 characters in UTF-8 are
identical to ASCII, making it easy to work with systems that use ASCII.
Expand All @@ -142,9 +142,8 @@ characters from various languages, mathematical symbols, and other special chara
programming languages, databases, and operating systems. Choosing UTF-8 helps
ensure compatibility across different platforms and technologies.

When working with text data, it's generally a good idea to use UTF-8 encoding to avoid
issues related to character representation and ensure that your software can handle a
diverse set of characters and languages.
When working with text data, UTF-8 encoding is recommended to avoid issues related to character
representation and ensure that a diverse set of characters and languages is supported.

For assistance, please contact: <[email protected]>
{% endblock %}
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
{% block body %}
# SYSTEMATIC SURVEY OCCURRENCES DATA TEMPLATE INSTRUCTIONS

## OVERVIEW
## Intended Usage
This template is used to record occurrence data; that is, the presence or absence of an organism
at a particular site locality at a point in time.

Expand All @@ -15,7 +15,7 @@ Data Repository database. Not all types of data have been catered for in the ava
templates at this stage; therefore, if you are unable to find a suitable template, please
contact <[email protected]> to make us aware of your data needs.

#### NEED TO KNOW:
#### Data Validation Requirements:
For data validation, you will need your data file to:
- be in the correct **file format,**
- have **fields that match the template downloaded** (do not remove, or
Expand Down Expand Up @@ -65,7 +65,7 @@ your data to the template by providing guidance on:

### ADDITIONAL FIELDS
Data that does not match the existing template fields may be added as
additional columns in the CSV files after the templated fields. <br>
additional columns in the CSV files after the templated fields.
For example: `eventRemarks`, `associatedTaxa`, `pathway`.

<ins>Table 1: Systematic Survey Occurrence data template fields with descriptions, conditions,
Expand All @@ -75,8 +75,8 @@ datatype format, and examples.</ins>

## APPENDICES
### APPENDIX-I: VOCABULARY LIST
With the exception of `geodeticDatum`, data validation does not require fields to adhere to the vocabularies
specified for the various vocabularied fields. These vocabularies are merely provided as a
With the exception of `geodeticDatum`, data validation does not require fields to adhere to the
vocabularies specified for the various vocabularied fields. These vocabularies are merely provided as a
means of assistance in developing a consistent language within the database. New terms may be added
to more appropriately describe your data that goes beyond the current list. Table 2 provides some
suggested values from existing sources such as: [Biodiversity Information Standard (TDWG)](https://dwc.tdwg.org/),
Expand All @@ -86,7 +86,7 @@ era](https://doi.org/10.1111/2041-210X.12594).

<ins>Table 2: Suggested values for the controlled vocabulary fields in the template. Each term has
a preferred label with a definition to aid understanding of its meaning. For some terms, alternative
labels with similar semantics are provided. Note: <font color="red">geodeticDatum value
labels with similar semantics are provided. Note: <font color="red">`geodeticDatum` value
**must** come from one of five options in this table.</font></ins>

<a name="vocabulary-list"></a>
Expand Down
Loading

0 comments on commit a2da834

Please sign in to comment.