-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question #62
Comments
Hey @vicru93 thanks for using RIBAP! The graph is a so-called UpSet plot. UpSet plots are a way to visualize the differences between many sets. You might be more familiar to Venn diagrams which work nice to visualize the overlaps between three or four sets but can not handle more. In the plot above you have your genomes as rows and you can see how many genes were annotated (blue bars). In the columns you see how many genes are shared between different subsets of your input genomes. For example, all genomes except "A2_bin4" have 691 genes in common according to RIBAPs core gene calculation (first column in the plot). Next there is "A20_bin53“ which has 431 unique genes not found in any of the other genomes. And so on .. "A2_bin4" has 17 unique genes not found in any other of your input genomes. Does that clarify? May I also ask: are you running RIBAP on metagenome-assembled genomes? I am assuming this from the naming ("...bin"..) ;) If so, that's really cool bc this was anyway smt I wanted to try. I am happy about any feedback you can provide. |
Hi @hoelzer: The only thing is that I am still trying to understand the generated output, for example the html does not redirect me in the generated alignments, so it is difficult to have traceability from this file. Something I also see is the difficulty of the analysis when using MAGs, for example the completeness of "A2_bin4" according to CheckM2 is 70% and a contamination percentage below 6%, but the annotation done by prokka was deficient and this is something that worries if you want to follow changes in different genomes. I suppose all this I am writing to you was something predictable when studying environmental simplicities :( For now, this tool seems to me to serve its purpose, and now that you have explained this graph to me, I can see the great utility it can have for my research, thank you. Best W, Victor. |
Hey @vicru93, thanks for the details about your project. That's great, and it's exactly what I also had in mind as a future application for RIBAP. But we have never tested the pipeline on MAGs so far. My approach would have also been to produce MAGs and then filter them based on completeness/contamination (CheckM) and throw them into RIBAP to get a core meta-pangenome. Then, monitor how this changes over time...
Yes, I agree. The output can be improved; we already have an open issue #60. However, the HTML report should link you to the MSA (and tree) - does this not work for you? Also, in the output files, you should find a table with all RIBAP groups, which you can filter depending on your needs. For example, users might be interested not only in genes found in 100% of the input genomes but also in lower cutoffs, etc...
I see. But, strangely, you get almost no annotated genes even though it's 70% complete, according to CheckM. How large is the MAG in nucleotide size? Does it maybe belong to a very small bacteria? However, the smallest I know have ~1 Mbp genome and when the MAG is still 70% complete I would expect at least a few hundred genes found by Prokka...
Yeah, wild west.
Great, let me know if you have any further questions. |
Closing bc seems to be solved. Pls reopen if there is more |
Sorry for this question, but could someone help me understand this graph?
P.S.: IDs with "_REF" are reference genomes
Best W,
Victor.
The text was updated successfully, but these errors were encountered: