-
Notifications
You must be signed in to change notification settings - Fork 3
Home
camiplata edited this page Aug 13, 2018
·
2 revisions
Welcome to the Biodiversity Data Quality with OpenRefine wiki!
This wiki is under construction 🔨🔧
Previous experience in open refine is not needed, to be ready to use the Biodiversity Data Quality scripts you only need to install OpenRefine and check how to upload your data.
Procedure:
- Matches original data with species matching output (by id or scientific name)
- Retrieves GBIF's rank and status allowing the user to evaluate the state of each name
- Retrieves GBIF's higher taxonomy for all names
- Compares GBIF'S taxonomic suggestions with original taxonomy
Conditions:
- File obtained from speciesMatching GBIF named as 'normalized'
- Dataset with columns 'scientificName','scientificName',
Warnings:
- The limit of GBIF speciesMatching web service in a single query is 6000 occurrences.
- New data will be stored in columns at the beginning of the dataset
Procedure:
- Matches original scientificName with GBIF's taxonomic Backbone
- Retrieves GBIF's rank and status allowing the user to evaluate the state of each name
- Retrieves GBIF's higher taxonomy for all names
- Compares GBIF'S taxonomic suggestions with original taxonomy using a boolean descriptor (1,0)
Conditions:
- Dataset with minimum 'scientificName' column
- To obtain a validation of higher taxonomy these elements are also required: 'kingdom','phylum','class','order','family','genus'
Important:
The Definitions of object/elements retrieve by GBIF's API may differ with those of the online tool SpeciesMatching
- ScientificName: GBIF's scientific name matching the scientificName of the query
- canonicalName: GBIF's canonicalName matching the scientificName of the query
- species: GBIF's accepted name given the GBIF's scientific name matching the scientificName of the query
Conventions boolean descriptor
- 0-GBIF's suggested name DOES NOT match the original name
- 1-GBIF's suggested name matches the original name
Warnings:
- New data will be stored in columns at the beginning of the dataset
- Taxonomy elements are reorganized to facilitate the taxonomic validation
Procedure:
- Calls canadensys date Parsing API
- Cleans output for getting a clean JSON format
- Extracts ISO Date as text
Conditions:
- Dataset with column name 'eventDate',
Warnings:
- New data will be stored in columns at the beginning of the dataset
- Review output for nulls, canadensys will not read all date formats
Procedure:
- Creates concatenated columns of geographic names
- Match single and concatenated columns with DIVIPOLA
- Returns matched names when matching was posible
Conditions:
- Dataset with columns 'stateProvince','county','municipality'
- DIVIPOLA archive, latest version provided by SiB Colombia
Warnings:
- New data will be stored in columns at the beginning of the dataset
- Review output (spMatch, spcMatch, spcmMatch)=blank, those rows needs to be fixed and standardized
Conventions:
- spcm = stateProvince+County+Municipality
- spc = stateProvince+County
- sp = stateProvince