Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes were made in the scripts to resolve errors in the data cleaning and formatting of the PharmGKB CSV+tMCF pairs. It also breaks out the phenotypes to distinguish that one is a MeSHQualifier and seven are MeSHSupplementaryConceptRecords, unlike the rest of the phenotypes which are MeSHDescriptors. Therefore these were separated into 3 CSV+tMCF pairs. In particular, the links to the enums and between entity types were fixed. This was done by initializing all nodes referenced and then pointing to them within the tMCF. Because of this any existence missing errors in the json reports can be ignored. The changes to the scripts, tMCF files, and documentation (README.md) for this import are part of GitHub PR 1056 https://github.com/datacommonsorg/data/pull/1056 #926

Merged
merged 1 commit into from
Jul 23, 2024

Conversation

copybara-service[bot]
Copy link
Contributor

Changes were made in the scripts to resolve errors in the data cleaning and formatting of the PharmGKB CSV+tMCF pairs. It also breaks out the phenotypes to distinguish that one is a MeSHQualifier and seven are MeSHSupplementaryConceptRecords, unlike the rest of the phenotypes which are MeSHDescriptors. Therefore these were separated into 3 CSV+tMCF pairs. In particular, the links to the enums and between entity types were fixed. This was done by initializing all nodes referenced and then pointing to them within the tMCF. Because of this any existence missing errors in the json reports can be ignored. The changes to the scripts, tMCF files, and documentation (README.md) for this import are part of GitHub PR 1056 datacommonsorg/data#1056

Schema Changes:

  • Add CPICLevelEnum, DosageGuidelineSourceCpicNoRecommendation, DrugTypeEnum, PGxLevelEnum, PharmacogeneticAssociationEnum.
  • Add properties for clinicalAnnotationCount, clinicalAnnotationCountLevel1_2, clinicalGuidelineAnnotationCount, dosageGuideline, drugHasPrescribingInfo, drugLabelAnnotationCount, drugType, fdaTopPharmacogeneticLevel, geneticVariantAnnotationCount, hasCpicDosingGuideline, hasGenomicCoordinates, hasGeneticVariantAnnotation, hasPrescribingInfo, medicalDictionaryForRegulatoryActivitiesId, metabolicPathwayCount, pharmageneticAssociation, topClinicalAnnotationLevel, topCpicLevel, topPharmacogeneticLevel, veryImportantPharmacogeneCount.
  • Remove properties for fdaTopPGxLevel, mintID, nationalClinicalTrialNumber, nationalDrugCode, nationalDrugFileReferenceTerminologyCode, neuroMabID, patentID,pharmGkbClinicalAnnotationCount,pharmGkbPathwayCount, pkgbTags.

@copybara-service copybara-service bot force-pushed the copybara2git_653311391 branch from d556ad9 to 536c341 Compare July 17, 2024 19:10
@copybara-service copybara-service bot force-pushed the copybara2git_653311391 branch from 536c341 to a793e83 Compare July 17, 2024 20:09
@copybara-service copybara-service bot force-pushed the copybara2git_653311391 branch from a793e83 to 656a320 Compare July 17, 2024 22:54
@copybara-service copybara-service bot force-pushed the copybara2git_653311391 branch from 656a320 to af95f72 Compare July 18, 2024 00:31
@copybara-service copybara-service bot force-pushed the copybara2git_653311391 branch from af95f72 to 6e7dc45 Compare July 23, 2024 03:45
…ng and formatting of the PharmGKB CSV+tMCF pairs. It also breaks out the phenotypes to distinguish that one is a MeSHQualifier and seven are MeSHSupplementaryConceptRecords, unlike the rest of the phenotypes which are MeSHDescriptors. Therefore these were separated into 3 CSV+tMCF pairs. In particular, the links to the enums and between entity types were fixed. This was done by initializing all nodes referenced and then pointing to them within the tMCF. Because of this any existence missing errors in the json reports can be ignored. The changes to the scripts, tMCF files, and documentation (README.md) for this import are part of GitHub PR 1056 datacommonsorg/data#1056

Schema Changes:
- Add CPICLevelEnum, DosageGuidelineSourceCpicNoRecommendation, DrugTypeEnum, PGxLevelEnum, PharmacogeneticAssociationEnum.
- Add properties for clinicalAnnotationCount, clinicalAnnotationCountLevel1_2, clinicalGuidelineAnnotationCount, dosageGuideline, drugHasPrescribingInfo, drugLabelAnnotationCount, drugType, fdaTopPharmacogeneticLevel, geneticVariantAnnotationCount, hasCpicDosingGuideline, hasGenomicCoordinates, hasGeneticVariantAnnotation, hasPrescribingInfo, medicalDictionaryForRegulatoryActivitiesId, metabolicPathwayCount, pharmageneticAssociation, topClinicalAnnotationLevel, topCpicLevel, topPharmacogeneticLevel, veryImportantPharmacogeneCount.
- Remove properties for fdaTopPGxLevel, mintID, nationalClinicalTrialNumber, nationalDrugCode, nationalDrugFileReferenceTerminologyCode, neuroMabID, patentID,pharmGkbClinicalAnnotationCount,pharmGkbPathwayCount, pkgbTags.

PiperOrigin-RevId: 655002332
@copybara-service copybara-service bot force-pushed the copybara2git_653311391 branch from 6e7dc45 to 9731b2c Compare July 23, 2024 03:50
@copybara-service copybara-service bot merged commit 9731b2c into main Jul 23, 2024
@copybara-service copybara-service bot deleted the copybara2git_653311391 branch July 23, 2024 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant