Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mpileup subworkflow for paired, germline and tumoronly varcalling #570

Merged
merged 49 commits into from
Jun 17, 2022

Conversation

WackerO
Copy link
Contributor

@WackerO WackerO commented May 31, 2022

This PR extracts mpileup code for paired, germline and tumoronly variant calling into a separate subworkflow.
Also, as this is related: in this PR, samtools was changed to bgzip mpileup output addressing the space problem described in this issue. The changes are accounted for.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@WackerO WackerO requested a review from maxulysse as a code owner May 31, 2022 13:18
@WackerO WackerO requested a review from FriederikeHanssen June 2, 2022 06:56
@WackerO WackerO changed the title Add mpileup subworkflow for tumoronly varcalling Add mpileup subworkflow for paired varcalling and tumoronly varcalling Jun 2, 2022
@FriederikeHanssen
Copy link
Contributor

hm ok still tests failing :/

@FriederikeHanssen
Copy link
Contributor

GHA issue is gone for the moment, but controlfreec still fails

@maxulysse
Copy link
Member

looking good.
I feel that we're missing the mpileup for a single germline sample.
But I'm also afraid of how complicated it'll be to include that here

@FriederikeHanssen
Copy link
Contributor

looking good. I feel that we're missing the mpileup for a single germline sample. But I'm also afraid of how complicated it'll be to include that here

hm I don't think it would be too bad? it is just the exact same subworkflow import into germline_variantcalling?! all the reference files are already there. Now what would be more difficult to only have MPILEUP_NORMAL run once when somatic is called and paired_variant_only = false, but one thing after the other

@WackerO WackerO changed the title Add mpileup subworkflow for paired varcalling and tumoronly varcalling Add mpileup subworkflow for paired, germline and tumoronly varcalling Jun 7, 2022
conf/modules.config Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
@maxulysse
Copy link
Member

@nf-core-bot fix linting

conf/modules.config Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
@FriederikeHanssen
Copy link
Contributor

For the tests for controlfreec to pass, you will need to adapt some file paths to include the .gz thing now. The tool apparently uses the whole filename as prefix. Not sure we can prevent that (unless updating the controlfreec module and mv the files). Would be fine as is for me for the moment:

├── config.txt
├── sample4_vs_sample3.bed
├── sample4_vs_sample3.circos.txt
├── sample4_vs_sample3.normal.mpileup.gz_control.cpn
├── sample4_vs_sample3.p.value.txt
├── sample4_vs_sample3.tumor.mpileup.gz_BAF.txt
├── sample4_vs_sample3.tumor.mpileup.gz_CNVs
├── sample4_vs_sample3.tumor.mpileup.gz_info.txt
├── sample4_vs_sample3.tumor.mpileup.gz_ratio.BedGraph
├── sample4_vs_sample3.tumor.mpileup.gz_ratio.txt
├── sample4_vs_sample3.tumor.mpileup.gz_sample.cpn
├── sample4_vs_sample3_BAF.png
├── sample4_vs_sample3_ratio.log2.png
└── sample4_vs_sample3_ratio.png

Speaking of tests, mpileup tests for running it alone are missing as far as I see. Could you add them?

@WackerO
Copy link
Contributor Author

WackerO commented Jun 14, 2022

Yep, I noticed that as well. I am currently looking into that; I would like to try and change the prefix in controlfreec so that it does not contain the .gz part.

What do you mean with running mpileup alone? Independently of controlfreec? Do I have to create a file in sarek/tests/ for that?

@WackerO
Copy link
Contributor Author

WackerO commented Jun 14, 2022

Now I am even more confused. In controlfreec, the prefix seems to actually be correct, i.e. something like sample4_vs_sample3. Where is the .gz coming from?

@FriederikeHanssen
Copy link
Contributor

so controlfreec generates a ton of output files with "fixed" endings by the tool. You can only determine the prefix common between all output files sample4_vs_sample3 here. the rest is appended without our influence

@FriederikeHanssen
Copy link
Contributor

Yes you need to edit:
.github/workflow/ci.yml and add a test "mpileup" to the others there

then edit the tests/test_tools.yml similar to the other tools. M<aybe take a look at the examples there and if you have questions or can't get it to work, ping me on slack :)

Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @WackerO ! From my side only two things missing:

  • Update the changelog
  • Update the subway map. (I can help you there, if you like)

WackerO added 3 commits June 17, 2022 10:23
…ling, fixed space in germline_varcalling, updated subway map (text was slightly changed by inkscape automatically), updated Changelog
Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I actually saw one or two more things:

The versions for all the subworkflows must be propagated up to the germline/tumor only subworkflow by adding:

ch_versions = ch_versions.mix(RUN_MPILEUP_TUMOR.out.versions)

etc

The line in the changelog should be in order of PR number

@FriederikeHanssen FriederikeHanssen merged commit 67de979 into nf-core:dev Jun 17, 2022
@WackerO WackerO deleted the mpileup2 branch June 17, 2022 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants