Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export TreeSummarizedExperiment R object #807

Merged
merged 15 commits into from
Jan 14, 2025

Conversation

d4straub
Copy link
Collaborator

@d4straub d4straub commented Nov 25, 2024

This solves #802 and is based on #805

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@TuomasBorman
Copy link

Cool!

In SummarizedExperiment:

  • row names of abundance matrix must match with row names of taxonomy table
  • column names of abundance matrix must match with row names of sample metadata table

This error occurs if those names do not match, they must be exactly the same (same order also). I can check this week.

If the names do not match because of different order, it is simple to sort the data. If the naming scheme differs between abundance matrix versus taxonomy table and sample metadata, it is also easy to fix but then we just have to ensure that the order is the same despite the names.

@d4straub
Copy link
Collaborator Author

Dear @TuomasBorman , the pipeline runs successfully for me locally with singularity, however CI tests with docker fail with
docker: invalid reference format: repository name (bioconductor-treesummarizedexperiment%3A2.10.0--r43hdfd78af_0) must be lowercase.
Will need to be fixed somehow, most likely there are solutions somewhere in the nf-core slack solutions. Will search for this another time.

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 26, 2024

There seems to be an issue with read_tree, see error with -profile test_pplace:

     Error in read_tree("test_pplace.graft.test_pplace.epa_result.newick") : 
      could not find function "read_tree"

@TuomasBorman
Copy link

There seems to be an issue with read_tree, see error with -profile test_pplace:

     Error in read_tree("test_pplace.graft.test_pplace.epa_result.newick") : 
      could not find function "read_tree"

read_tree() comes from phyloseq package. However, TreeSE does not have dedicated function for loading the tree, instead ape::read.tree() is commonly used.

@d4straub
Copy link
Collaborator Author

@TuomasBorman would you be so kind and check whether the four rds files are fine, i.e. contain what you would expect? Two files produced with -profile test, two with -profile test_pplace.
tse_output.zip

@TuomasBorman
Copy link

test/dada2_TreeSummarizedExperiment.rds

  • Does not have phylogeny; should it have?
  • Taxonomy table has sequences. More convenient place would be to store them to referenceSeq(tse). Worth to note is that also phyloseq has refseq(pseq) slot to store these sequences, but similarly to TreeSE it seems that this is not used in phyloseq.
# If taxonomy table contains sequences, move them to referenceSeq slot
if( !is.null(rowData(tse)[["sequence"]]){
    referenceSeq(tse) <- DNAStringSet(rowData(tse)[["sequence"]])
    rowData(tse)[["sequence"]] <- NULL
}

test/qiime2_TreeSummarizedExperiment.rds

  • Does not have phylogeny; should it have?

test_pplace/pplace_TreeSummarizedExperiment.rds

  • Everything seems to be correct

test_pplace/qiime2_TreeSummarizedExperiment.rds

  • Everything is good

@d4straub
Copy link
Collaborator Author

Great, thanks. All files look as expected. But the phylogeny potentially can still be added as newick tree from QIIME2. I will test that in the next days. Unfortunately I think it will take until next week to finish this PR.

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@d4straub
Copy link
Collaborator Author

So, finally some progress. @TuomasBorman Could you have a look at the two objects? They should contain now sequences & pylogenetic tree. Also, is the addition inCITATIONS.md fine?
tse_output.zip

@d4straub d4straub mentioned this pull request Dec 20, 2024
11 tasks
@d4straub d4straub marked this pull request as ready for review December 20, 2024 15:47
@TuomasBorman
Copy link

DADA2 object have reference sequences and both have phylogeny correctly defined. Citation for TreeSE is also correct.

--> looks good

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything blocking

@@ -0,0 +1,82 @@
process TREESUMMARIZEDEXPERIMENT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks pretty straight forward, any reason why this is not an official nf-core module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why this is not an official nf-core module?

First we needed to find a usable implementation, this is now done, now it seems just laziness on my part, but we can call it time efficiency or priorities are on other stuff or just 24h/day instead of 240h/day.

@d4straub
Copy link
Collaborator Author

Thanks a lot!

@d4straub d4straub merged commit 64de8ae into nf-core:dev Jan 14, 2025
33 checks passed
@d4straub d4straub deleted the TreeSummarizedExperiment branch January 14, 2025 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants