Version 0.9
The CSV for the metadata can be found at openrefine/v0.9/IHEC_metadata_harmonization.v0.0.csv
The overall diff between v0.8 and v0.9 can be found at openrefine/v0.9/diff_v0.8_v0.9.json
This version comes with the first “extended” version openrefine/v0.9/IHEC_metadata_harmonization.v0.9.extended.csv that includes higher level annotations for the three ontology columns.
The following columns have been added in comparison to the normal v0.9:
donor_health_status_ontology_curie_ncit
: mapping from NCIM to NCIT curies for the donor_health_status_ontology_curiedisease_ontology_curie_ncit
: mapping from NCIM to NCIT curies for the disease_ontology_curiesample_ontology
: ontology to use based on thebiomaterial_type
sample_ontology_term
: the ontology term extracted fromdisease_ontology_curie
that should reflect eitherline
,tissue_type
orcell_type
, depending on thesample_ontology
sample_ontology_term_high_order_JeffreyHyacinthe
: semi-manual annotation by Jeffrey Hyacinthe. Had been applied to v0.8sample_ontology_term_high_order_JonathanSteif
: semi-manual annotation by Jonathan Steif. Had been applied to v0.9 draftsample_ontology_term_high_order_manual
: semi-manual annotation using the automatic extraction columns below and the manual annotation above. Created by some members of the IHEC IA metadata group (Pierre-Etienne Jacques, Gabriella Frosi and Quirin Manz). Had been applied to v0.9. Although this is the current higher level annotation forsample_ontology_term
, it should be handled with caution, since it's still preliminary and should be checked by others.
Note that the sample_ontology_term
columns were grouped by their sample_ontology
in the automatic extraction.
The following columns are a result of the automatic extraction:
($column)_($order)(_unique)?
:
$column
describes the ontology column that the automatic extraction was performed on. One of[sample_ontology_term, donor_health_status_ontology_term, disease_ontology_term]
$order
describes the number of unique terms that are overall allowed in the column (or group forsample_ontology_term
). Forintermediate_order
the maximum number of terms is 30, forhigh_order
it is 15
_unique
suffix is attached if the automatic extraction considered only unique terms for counting before the automatic extraction. If not attached, the extraction was performed on all entries and duplicates were counted as well. This basically reflects the underlying dataset in which the extraction was performed, allowing duplicates or not.
This results in the following 12 additional columns:sample_ontology_term_intermediate_order_unique
:sample_ontology_term_high_order_unique
:sample_ontology_term_intermediate_order
:sample_ontology_term_high_order
:donor_health_status_ontology_term_intermediate_order_unique
:donor_health_status_ontology_term_high_order_unique
:donor_health_status_ontology_term_intermediate_order
:donor_health_status_ontology_term_high_order
:disease_ontology_term_intermediate_order_unique
:disease_ontology_term_high_order_unique
:disease_ontology_term_intermediate_order
:disease_ontology_term_high_order
: