Add disease ontology #473

suhana13 · 2021-08-03T21:55:07Z

Disease Ontology data import is all set for a review !

spiekos

Missing test data and unittests, but this doesn't block data import.

spiekos · 2021-08-06T00:10:08Z

scripts/biomedical/diseaseOntology/README.md

+
+- ### License
+
+This data is under a Creative Commons Public Domain Dedication [CC0 1.0 Universal license](https://disease-ontology.org/resources/do-resources).


Missing link to Disease Ontology website where it states that it's under the Creative Commons Public Domain license

@spiekos , I'm sorry I didn't quite understand that because https://disease-ontology.org/resources/do-resources directs the user to the license page on DO website.

spiekos · 2021-08-06T00:10:46Z

scripts/biomedical/diseaseOntology/README.md

+
+### Schema Overview
+
+The schema representing reaction, metabolite and microbiome data from VMH is defined in [DO.mcf](https://raw.githubusercontent.com/suhana13/ISB-project/main/combined_list.mcf) and [DO.mcf](https://raw.githubusercontent.com/suhana13/ISB-project/main/combined_list_enum.mcf).


schema will be stored in ChemicalComounds.mcf and ChemicalCompoundsEnum.mcf

@spiekos , should it be chemical_compounds.mcf or disease.mcf?

spiekos · 2021-08-06T00:12:05Z

scripts/biomedical/diseaseOntology/disease_ontology.tmcf

+icdoID: C:DiseaseOntology->ICDO
+meshID: C:DiseaseOntology->MESH
+nciID: C:DiseaseOntology->NCI
+snowmedctusID: C:DiseaseOntology->SNOMEDCTUS20200901


are these SNOMEDCTUS20200901 different columns in the file? Why are they stored under such odd column names?

the columns represent different version of the snow med, so SNOMEDCTUS20200901 refers to the 09/01/2020 version of the data

spiekos · 2021-08-06T00:12:33Z

scripts/biomedical/diseaseOntology/README.md

+
+## About the import
+
+### Artifacts


Missing test data and unittests

spiekos · 2021-08-06T00:13:14Z

scripts/biomedical/diseaseOntology/format_disease_ontology.py

+
+
+def main():
+    file_input = sys.argv[1]


too much is stored under main. Put into other functions and then call in main

chejennifer

Sorry for the late review, but just a few small things! Also, seems to be missing unit tests and test datasets.

chejennifer · 2021-08-06T00:53:01Z

scripts/biomedical/diseaseOntology/README.md

+
+### Overview
+
+The Disease Ontology database provides a standardized ontology for human diseases, for the purposes of consistency and reusability. It has contains extensive cross mapping of DO terms to other databases, namely, MeSH, ICD, NCI’s thesaurus, SNOMED and OMIM. More information on the database can be found [here](https://disease-ontology.org).


"has contains" -> "contains"

scripts/biomedical/diseaseOntology/format_disease_ontology.py

chejennifer · 2021-08-06T01:50:06Z

scripts/biomedical/diseaseOntology/format_disease_ontology.py

+        query_str = """
+        SELECT DISTINCT ?id ?element_name
+        WHERE {{
+        ?element typeOf MeSHDescriptor .


not sure if there is a typo or "MeSHDescriptor" is supposed to be cased like that?

Yes, its MeSHDescriptor - https://datacommons.org/browser/MeSHDescriptor

scripts/biomedical/diseaseOntology/format_disease_ontology.py

Add About the Import section

spiekos

Please update the README.md to respond to the comments.

spiekos · 2022-07-26T06:30:20Z

scripts/biomedical/diseaseOntology/README.md

+
+### Notes and Caveats
+
+The original format of the data was `.owl` and it was converted to a `.csv` file prior to ingestion into Data Commons.


Please confirm that the only caveat of the dataset and data cleaning process is that it needs to be converted from an .owl file to a .csv file. Was there nothing acknowledged by Disease Ontology documentation itself or any strange things that you encountered in cleaning the data? E.g. here is where you should note that a node can have multiple parents.

spiekos · 2022-07-26T06:33:08Z

scripts/biomedical/diseaseOntology/README.md

+
+### Download Data
+
+The human disease ontology data can be downloaded from their official github repository [here](https://www.vmh.life/#human/all). The data is in `.owl` format and had to be parsed into a `.csv` format (see [Notes and Caveats](#notes-and-caveats) for additional information on formatting).


Please create a shell script, which downloads the data. If the data is converted from .owl to .csv outside of your format_disease_ontology.py script, then also do that here.

spiekos · 2022-07-26T06:45:53Z

scripts/biomedical/diseaseOntology/README.md

+
+### Artifacts
+
+#### Scripts


Add short descriptions to all scripts and files. Internally link the scripts and files to itself in the directory.

spiekos · 2022-07-26T06:46:26Z

scripts/biomedical/diseaseOntology/README.md

+
+`format_disease_ontology.py`
+
+##### Test Script


Update all of the script and file names to match what you added to the directory

spiekos · 2022-07-26T06:47:25Z

scripts/biomedical/diseaseOntology/README.md

+To test format_refseq_chromosome_id_to_dcid.py run:
+
+```
+python format_disease_ontology.py input_file.owl expected_output.csv


Update this command to reflect file names from the download file and what you want the final csv file to be named

spiekos · 2022-07-26T06:48:40Z

scripts/biomedical/diseaseOntology/README.md

+
+##### Test File
+
+`input_file.txt`


Missing small input file and expected output file that can be used to fun the script to test that it generates the expected output.

spiekos

.tmcf is now updated. Please run json tests using the updated file.

spiekos · 2022-08-01T02:12:46Z

scripts/biomedical/diseaseOntology/unit-tests/test-output.csv

@@ -0,0 +1,25 @@
+ICD10_id,element,MESH_id,element_name,subClassOf,IAO_0000115,hasAlternativeId,hasExactSynonym,id,label,ICDO,MESH,NCI,SNOMEDCTUS20200901,UMLSCUI,ICD9CM,SNOMEDCTUS20200301,ICD10CM,SNOMEDCTUS20180301,GARD,OMIM,ORDO,EFO,MEDDRA,SNOMEDCTUS20190901


This file needs to be updated to the current expected output of the script. This is currently an old version.

spiekos · 2022-08-01T02:14:44Z

scripts/biomedical/diseaseOntology/format_disease_ontology.py

+    df_do['ICD9CM'] = "ICD10/" + df_do['ICD9CM'].apply(str)
+    return df_do
+
+def main():


Please confirm that this is the most up-to-date version of the script that includes changes like how to handle lists of texts values that have commas within a single cell value and other changes that we've previously discussed.

spiekos · 2022-08-01T02:45:50Z

scripts/biomedical/diseaseOntology/disease_ontology.tmcf

+alternativeDiseaseOntologyID : C:DiseaseOntology->hasAlternativeId 
+diseaseSynonym: C:DiseaseOntology->hasExactSynonym
+internationalClassificationOfDiseaseID: C:DiseaseOntology->ICDO
+medicalSubjectHeadingDescriptorID: C:DiseaseOntology->MESH


Can you please confirm that all of these IDs start with D. If not, then they aren't all MeSH Descriptors and we should switch to using the more general "medicalSubjectHeadingID" property here.

spiekos

The csv test file output is old. The Text values are not properly formatted. Please update the expected output file for the test and confirm that the current format_disease_ontology.py script is up to date. Please confirm that if you download the github repo and run the test script that it passes.

spiekos

Please add missing diseaseOntologyID property

spiekos · 2022-09-20T22:52:00Z

scripts/biomedical/diseaseOntology/disease_ontology.tmcf

@@ -0,0 +1,24 @@
+Node: E:DiseaseOntology->E1
+typeOf: dcs:Disease


Update script to include a column of text values with the disease ontology ids eg "DOID:0060329" then add to the tmcf the line diseaseOntologyID: C:DiseaseOntology->diseaseOntologyID

suhana13 added 3 commits August 3, 2021 14:53

feat: add disease_ontology.tmcf

2b65350

feat: add format_disease_ontology.py

e952017

feat: add README

bc28cca

blunderbuss-gcf bot assigned chejennifer Aug 3, 2021

google-cla bot added the cla: yes label Aug 3, 2021

spiekos requested review from spiekos, pradh and chejennifer August 6, 2021 00:04

spiekos and others added 2 commits August 5, 2021 17:04

Merge branch 'master' into add_disease_ontology

91667cc

Update README.md

f8efa54

spiekos reviewed Aug 6, 2021

View reviewed changes

chejennifer reviewed Aug 6, 2021

View reviewed changes

suhana13 added 4 commits August 6, 2021 07:15

feat: add helper function

9c12a2d

fix: nits

8d4f7f2

fix: property in tmcf

6a5cc0c

feat: format cols

2aef466

chejennifer approved these changes Apr 29, 2022

View reviewed changes

chejennifer and others added 4 commits April 29, 2022 15:34

Merge branch 'master' into add_disease_ontology

06f125e

add unittests

4832d53

Update README.md

702e9be

Update README.md

75f9256

Add About the Import section

spiekos reviewed Jul 27, 2022

View reviewed changes

spiekos and others added 2 commits July 27, 2022 10:36

Merge branch 'master' into add_disease_ontology

86502c0

Update .tmcf

1329557

spiekos reviewed Aug 1, 2022

View reviewed changes

Suhana Bedi added 5 commits August 5, 2022 11:19

update readme

8e6f5ce

feat: add download file

377841a

add function edits to the script

15cdeb1

fix: ICD10 formatting

370a2e5

feat: update tmcf

75dc2d6

Suhana Bedi and others added 2 commits September 19, 2022 18:47

fix: line number for formatting

e783ba9

Update disease_ontology.tmcf

1788784

spiekos reviewed Sep 20, 2022

View reviewed changes

Suhana Bedi and others added 2 commits September 20, 2022 15:32

fix: column formatting

3db959a

Update disease_ontology.tmcf

c3eac4a

spiekos reviewed Sep 20, 2022

View reviewed changes

Suhana Bedi added 7 commits September 22, 2022 09:23

add diseaseID column

526266f

fix column formatting

10bf338

fix unit tests

010be38

remove old test file

03926bc

feat: add missing synonyms for disease terms

caa3e3e

feat:update format_disease_ontology.py

4d0c493

feat: add illegal char check

5f0c1a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add disease ontology #473

Add disease ontology #473

suhana13 commented Aug 3, 2021

spiekos left a comment

spiekos Aug 6, 2021

suhana13 Aug 6, 2021

spiekos Aug 6, 2021

suhana13 Aug 6, 2021

spiekos Aug 6, 2021

suhana13 Aug 6, 2021

spiekos Aug 6, 2021

spiekos Aug 6, 2021

suhana13 Aug 6, 2021

chejennifer left a comment •

edited

Loading

chejennifer Aug 6, 2021

suhana13 Aug 6, 2021

chejennifer Aug 6, 2021

suhana13 Aug 6, 2021

spiekos left a comment

spiekos Jul 26, 2022

spiekos Jul 26, 2022

spiekos Jul 26, 2022

spiekos Jul 26, 2022

spiekos Jul 26, 2022

spiekos Jul 26, 2022

spiekos left a comment

spiekos Aug 1, 2022

spiekos Aug 1, 2022

spiekos Aug 1, 2022

spiekos left a comment

spiekos left a comment

spiekos Sep 20, 2022


		- ### License

		This data is under a Creative Commons Public Domain Dedication [CC0 1.0 Universal license](https://disease-ontology.org/resources/do-resources).


		### Schema Overview

		The schema representing reaction, metabolite and microbiome data from VMH is defined in [DO.mcf](https://raw.githubusercontent.com/suhana13/ISB-project/main/combined_list.mcf) and [DO.mcf](https://raw.githubusercontent.com/suhana13/ISB-project/main/combined_list_enum.mcf).


		### Overview

		The Disease Ontology database provides a standardized ontology for human diseases, for the purposes of consistency and reusability. It has contains extensive cross mapping of DO terms to other databases, namely, MeSH, ICD, NCI’s thesaurus, SNOMED and OMIM. More information on the database can be found [here](https://disease-ontology.org).


		### Notes and Caveats

		The original format of the data was `.owl` and it was converted to a `.csv` file prior to ingestion into Data Commons.


		### Download Data

		The human disease ontology data can be downloaded from their official github repository [here](https://www.vmh.life/#human/all). The data is in `.owl` format and had to be parsed into a `.csv` format (see [Notes and Caveats](#notes-and-caveats) for additional information on formatting).

		@@ -0,0 +1,25 @@
		ICD10_id,element,MESH_id,element_name,subClassOf,IAO_0000115,hasAlternativeId,hasExactSynonym,id,label,ICDO,MESH,NCI,SNOMEDCTUS20200901,UMLSCUI,ICD9CM,SNOMEDCTUS20200301,ICD10CM,SNOMEDCTUS20180301,GARD,OMIM,ORDO,EFO,MEDDRA,SNOMEDCTUS20190901

		@@ -0,0 +1,24 @@
		Node: E:DiseaseOntology->E1
		typeOf: dcs:Disease


		`format_disease_ontology.py`

		##### Test Script


		##### Test File

		`input_file.txt`

Add disease ontology #473

Are you sure you want to change the base?

Add disease ontology #473

Conversation

suhana13 commented Aug 3, 2021

spiekos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chejennifer left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spiekos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spiekos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spiekos left a comment

Choose a reason for hiding this comment

spiekos left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chejennifer left a comment •

edited

Loading