Skip to content

Version 0.9

Compare
Choose a tag to compare
@quirinmanz quirinmanz released this 17 Aug 12:10
· 26 commits to main since this release

The CSV for the metadata can be found at openrefine/v0.9/IHEC_metadata_harmonization.v0.0.csv

The overall diff between v0.8 and v0.9 can be found at openrefine/v0.9/diff_v0.8_v0.9.json

This version comes with the first “extended” version openrefine/v0.9/IHEC_metadata_harmonization.v0.9.extended.csv that includes higher level annotations for the three ontology columns.
The following columns have been added in comparison to the normal v0.9:

  • donor_health_status_ontology_curie_ncit: mapping from NCIM to NCIT curies for the donor_health_status_ontology_curie
  • disease_ontology_curie_ncit: mapping from NCIM to NCIT curies for the disease_ontology_curie
  • sample_ontology: ontology to use based on the biomaterial_type
  • sample_ontology_term: the ontology term extracted from disease_ontology_curie that should reflect either line, tissue_type or cell_type, depending on the sample_ontology
  • sample_ontology_term_high_order_JeffreyHyacinthe: semi-manual annotation by Jeffrey Hyacinthe. Had been applied to v0.8
  • sample_ontology_term_high_order_JonathanSteif: semi-manual annotation by Jonathan Steif. Had been applied to v0.9 draft
  • sample_ontology_term_high_order_manual: semi-manual annotation using the automatic extraction columns below and the manual annotation above. Created by some members of the IHEC IA metadata group (Pierre-Etienne Jacques, Gabriella Frosi and Quirin Manz). Had been applied to v0.9. Although this is the current higher level annotation for sample_ontology_term, it should be handled with caution, since it's still preliminary and should be checked by others.

Note that the sample_ontology_term columns were grouped by their sample_ontology in the automatic extraction.
The following columns are a result of the automatic extraction:

  • ($column)_($order)(_unique)?:
    $column describes the ontology column that the automatic extraction was performed on. One of [sample_ontology_term, donor_health_status_ontology_term, disease_ontology_term]
    $order describes the number of unique terms that are overall allowed in the column (or group for sample_ontology_term). For intermediate_order the maximum number of terms is 30, for high_order it is 15
    _unique suffix is attached if the automatic extraction considered only unique terms for counting before the automatic extraction. If not attached, the extraction was performed on all entries and duplicates were counted as well. This basically reflects the underlying dataset in which the extraction was performed, allowing duplicates or not.
    This results in the following 12 additional columns:
  • sample_ontology_term_intermediate_order_unique:
  • sample_ontology_term_high_order_unique:
  • sample_ontology_term_intermediate_order:
  • sample_ontology_term_high_order:
  • donor_health_status_ontology_term_intermediate_order_unique:
  • donor_health_status_ontology_term_high_order_unique:
  • donor_health_status_ontology_term_intermediate_order:
  • donor_health_status_ontology_term_high_order:
  • disease_ontology_term_intermediate_order_unique:
  • disease_ontology_term_high_order_unique:
  • disease_ontology_term_intermediate_order:
  • disease_ontology_term_high_order: