Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate variants in MSK_Sophia_LungPts_cBio_mutations_extended - 7-27-23.txt #1214

Closed
3 tasks done
n1zea144 opened this issue Nov 27, 2023 · 1 comment
Closed
3 tasks done
Assignees
Labels
backend-scrum items centered around engineering activities interrupt

Comments

@n1zea144
Copy link
Collaborator

n1zea144 commented Nov 27, 2023

Done Condition (What do we need? Why do we need it? Keep this is small as possible!)

Understand why duplicates got into MAF. Update pipeline to remove duplicates from delivered MAF.

  • Fix bug checking for duplicates in CMO pipelines code
  • Reach out to CVR about duplicate variants being provided for gene aliases
  • Add remove-duplicate-maf-variants.py script to AZ generation and Sophia cohort generation scripts

Technical Description (How are we going to achieve the above)

Sophia claims there are 2556 duplicate variants in mutations extended. These 2213 are identified by unix sort/uniq

sort -k 1,1 -k 5,5 -k 6,6 -k 7,7 < ~/tmp/sophia-lung-cohort-maf.txt > ~/tmp/sophia-lung-cohort-maf-sorted.txt
uniq -d ~/tmp/sophia-lung-cohort-maf-sorted.txt > ~/tmp/sophia-lung-cohort-maf-duplicates.txt

(sophia-lung-cohort-maf.txt is renamed/copy of MSK_Sophia_LungPts_cBio_mutations_extended - 7-27-23.txt)

sophia-lung-cohort-maf-duplicates.txt

Potential Issues

Dependencies

Technical Requirements

Outside People/Teams

Changes

@n1zea144 n1zea144 added the backend-scrum items centered around engineering activities label Nov 27, 2023
@callachennault
Copy link
Collaborator

callachennault commented Jan 4, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend-scrum items centered around engineering activities interrupt
Projects
None yet
Development

No branches or pull requests

3 participants