Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on Simulating Metatranscriptomes for CAMI 2 Challenge Short-Read Metagenomes #194

Open
ohickl opened this issue Sep 13, 2024 · 1 comment

Comments

@ohickl
Copy link

ohickl commented Sep 13, 2024

Hi CAMISIM team,

First, thank you for developing and maintaining such a valuable tool! I am currently working with CAMI 2 challenge data sets and would like to simulate metatranscriptomes to accompany some of the short-read metagenomes produced for the challenge.

I noticed the recent entry in the wiki about the metatranscriptomics simulator module, which looks like a promising addition. However, I wanted to ask for further guidance on how to best approach the following:

  • Does the current metatranscriptomics simulator support generating transcriptomes specifically for the CAMI 2 short-read metagenomes?
  • Are there any best practices or specific settings you recommend to ensure consistency between the metagenomes and metatranscriptomes?
  • If there are limitations with the current implementation, are there alternative tools or workflows you would suggest to simulate transcriptomes that are aligned with CAMI 2 challenge data?

Thanks for your time and assistance!

Best

Oskar

@AlphaSquad
Copy link
Collaborator

AlphaSquad commented Sep 13, 2024

Hi Oskar,
thank you for your kind words and the interest in CAMISIM. Theoretically, CAMISIM should be able to simulate a metatranscriptome specifically for a CAMI 2 short-read metagenome. There is however no best practice/specific setting to ensure consistency since the metatranscriptome mode is still being tested - so you might encounter some unforseen problems. If you do, feel free to open another issue to help us in testing/debugging!
You can provide the exact same distribution files as for the metagenome and gff files for the CAMI 2 genomes to the metatranscriptome mode and should get a metatranscriptome corresponding to the metagenome - at least in terms of genomes and their abundances. gff files for the data base genomes should be available on NCBI, for the newly added genomes you probably need to create these yourself. Note that the genes themselves will be drawn completely random (i.e. specific genes you expect to be highly abundant might not be, but some others instead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants