Remove Annotated MAF before Import #958

averyniceday · 2022-05-17T16:18:29Z

think this is related to the file renaming effort from a while back where we merge all data_mutations* into one profile - data_mutations_annotated wasn't getting imported but now it's getting merged into data_mutations_extended.

To fix this we just make sure to remove the mutations_annotated file from the dir as part of processing

sheridancbio

rather than duplicating file contents and then removing the old file, I think we can just use the equivalent of unix "mv"

sheridancbio · 2022-05-20T16:27:12Z

import-scripts/subset_and_merge_crdb_pdx_studies.py

@@ -581,6 +581,7 @@ def annotate_maf(destination_to_source_mapping, root_directory, annotator_jar):
        # otherwise remove annotated maf if exists in destination study directory
        if annotator_status == 0:
            shutil.copy(annot_maf, orig_maf)
+            os.remove(annot_maf)


I think it would be better/faster/simpler to change:
shutil.copy(anot_maf, orig_maf)
os.remove(annot_maf)
to:
os.replace(annot_maf, orig_maf)

* remove annotated MAF to prevent duplicate * Update subset_and_merge_crdb_pdx_studies.py --------- Co-authored-by: Avery Wang <[email protected]>

author Manda Wilson <[email protected]> 1703199176 -0500 committer Robert Sheridan <[email protected]> 1711560265 -0400 upgrade to java 21 switch to genome-nexus-annotation-pipeline that uses new maf repo updated to spring 6, spring batch 5, spring boot 3 to match cbioportal fix typos Updates to AZ-MSKIMPACT to integrate with CDM (knowledgesystems#1098) Fix bug in checking for duplicate Mutation Records (knowledgesystems#1099) * Check if mutationRecord is duplicated before annotating * Populate mutationMap in loadMutationRecordsFromJson * add addRecordToMap * Remove comments, add local vars for debugging * Remove duplicate MAF variants for AZ * Fix remove-duplicate-maf-variants call * revert whitespace change updates for migrating darwin and crdb to java11 (knowledgesystems#1080) pom changes for pulling moved dependencies changes to java args to silence warnings Co-authored-by: cbioportal import user <[email protected]> Remove Annotated MAF before Import (knowledgesystems#958) * remove annotated MAF to prevent duplicate * Update subset_and_merge_crdb_pdx_studies.py --------- Co-authored-by: Avery Wang <[email protected]> Script to combine arbitrary files (knowledgesystems#1104) * Script to combine arbitrary files * Modify unit tests to work with script changes * Remove unnecessary column specifier * Fix syntax bug Add sophia script (knowledgesystems#1105) * Add sophia script * rename transpose_cna file * Add filter-clinical-arg-functions script * Add az var to correct automation environment * Add correct path to transpose_cna script * Call seq_date function * Add seq_date before filtering columns * syntax fix * Fix call to filter out clinical attribute columns * Fix nonsigned out file path * Automate folder name * directory fixes * remove quotes? * change date formatting * output filepath for duplicate variants script * use az_msk_impact_data_home var * move sophia_data_home to automation environment * Add comments * Change dir structures in sophia script to match new repo structure * Add git operations * Remove test file * Fix dirs for sophia zip command * remove quotes * Zip files before cleanup * move zip step before git push Add script for merging Dremio/SMILE into cmo-access (knowledgesystems#1102) - adds cfdna clinical and timeline data from dremio/SMILE - converts patient identifiers using "dmp over cmo" identifier logic from dremio - dremio patient id mapping table export code called to produce mapping table - main script then calls update_cfdna_clinical_sample_patient_ids_via_dremio.sh - merge.py used to combine clinical data from dremio with clinical data from cmo-access - metadata headers added using new script : merge_clinical_metadata_headers_py3.py - other import process flow (similar to other import scripts) followed - error detection step added after debugging for sporadic data loss in results Co-authored-by: Manda Wilson <[email protected]> Modify preconsume script to work on one cohort at a time (knowledgesystems#1107) Call correct function name add options for logging in for different accounts Preconsume archer-solid-cv4 and add fetch loop (knowledgesystems#1129) * Handle archer-solid-cv4 samples * Add loop * move each cohort to its own dir and fix filename switch to genome-nexus-annotation-pipeline that uses new maf repo use updated genome-nexus-annotation-pipeline update version of cmo-pipelines to 1.0.0 Convert BatchConfiguration to new Spring Batch format drop unneeded dependency from redcap removed gdd, updated crdb and ddp batch configs to spring batch 5 removed commons-lang start of converting cvr to spring batch 5 fix cvr fetcher BatchConfiguration fixed redcap pipeline spring batch 5 configuration make spring-batch-integration match batch version Co-authored-by: Manda Wilson <[email protected]> drop darwin fetcher (and docs/scripts)

* remove annotated MAF to prevent duplicate * Update subset_and_merge_crdb_pdx_studies.py --------- Co-authored-by: Avery Wang <[email protected]>

remove annotated MAF to prevent duplicate

b0b48cc

averyniceday changed the title ~~remove annotated MAF to prevent duplicate~~ Remove Annotated MAF before Import May 17, 2022

averyniceday requested a review from sheridancbio May 17, 2022 16:18

averyniceday mentioned this pull request May 17, 2022

Debug CRDB PDX Pipeline knowledgesystems/pipelines-scrum#912

Closed

sheridancbio requested changes May 20, 2022

View reviewed changes

Update subset_and_merge_crdb_pdx_studies.py

79eeaa3

averyniceday merged commit 9774ed9 into knowledgesystems:master Jan 19, 2024

averyniceday deleted the fixpdx branch January 19, 2024 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Annotated MAF before Import #958

Remove Annotated MAF before Import #958

averyniceday commented May 17, 2022

sheridancbio left a comment

sheridancbio May 20, 2022

Remove Annotated MAF before Import #958

Remove Annotated MAF before Import #958

Conversation

averyniceday commented May 17, 2022

sheridancbio left a comment

Choose a reason for hiding this comment

sheridancbio May 20, 2022

Choose a reason for hiding this comment