-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about usage of rpvg command #62
Comments
You would use the spliced graph. |
It worked perfectly, thank you. |
That's an area where I think there's still some room for methods development. If you marginalize out the haplotypes, you can get transcript-level expression values that fit into standard pipelines. We also found that for reasonably highly expressed genes, you can typically resolve the sample's haplotypes and their expression values as the top 2 most confident haplotypes. I think we have yet to see an analysis that fully utilizes all of the uncertainty over haplotypes that is included in the posterior distribution that rpvg estimates. |
Just following up on this. Do you think it's best to use two most confident haplotypes (rpvg.txt) or the most confident diplotype (rpvg_joint.txt). So far we've been using the most confident diplotype. I am re-running the analysis with two most confident haplotypes, but so far getting pretty similar results. Agnieszka |
The diplotype is what I meant. Sorry for the ambiguity. It makes sense that you get generally similar results, since the most likely diplotype will typically consist of the two most likely haplotypes. |
Thank you for the response. I would also like to know if you guys have a page for interpretation of the results. I can understand most of the results, but I am a little confused about the Cluster ID meaning |
We do not have an interpretation page, although that would be a good idea. A cluster is roughly analogous to a gene. In particular, transcripts are joined into a cluster whenever there is a read that is simultaneously aligned to both of them. This typically means the transcripts have an overlapping exon. However, it's possible that transcripts that are considered to be from the same gene do not get placed into a cluster, which would occur when their overlapped exons are not expressed. We deal with clusters rather than genes internally to avoid the squishy conceptual ambiguities around gene definitions in complex regions. Clusters can reasonably well identify groups of transcripts whose expression needs to be co-inferred. |
Yes, that would be fantastic. For us graph based expression quantification shapes up to be one of the main current applications other than SV genotyping. Is this an area where vg team sees future developments? |
Thank you, would you recommend to do a marginalization of the haplotypes from the diplotype_joint output to get transcript-level expression values and then try to run a differential expression analysis ? In what cases should I use the single haplotype expression file and the diplotype expression file? |
Hello rpvg team, I would appreciate information about it. |
To marginalize over haplotypes, you would essentially compute an average of the sum of |
Hello rpvg team
I have a general question and I would appreciate your help.
I am trying to infer the haplotype expression of human-experiment reads using rpvg.
I have performed the construction of a multipath alignment graph using previously built human graphs and spliced junction graphs. Now I want to perform rpvg using the command:
rpvg -g graph.xg -p paths.gbwt -a alignments.gamp -o rpvg_results -i
I understand that "paths.gbwt" is the pantranscriptome and the "alignments.gamp" is the multipath-alignment graph that I obtained using vg before, but I would like to know what is the " graph.xg" term in the command.
Is it the original human graph or is it the spliced-junction graph that I obtained before using vg?
I would appreciate your help
Best
The text was updated successfully, but these errors were encountered: