Question about usage of rpvg command #62

CarlosAmadeo7 · 2024-09-07T16:05:15Z

Hello rpvg team
I have a general question and I would appreciate your help.
I am trying to infer the haplotype expression of human-experiment reads using rpvg.
I have performed the construction of a multipath alignment graph using previously built human graphs and spliced junction graphs. Now I want to perform rpvg using the command:

rpvg -g graph.xg -p paths.gbwt -a alignments.gamp -o rpvg_results -i

I understand that "paths.gbwt" is the pantranscriptome and the "alignments.gamp" is the multipath-alignment graph that I obtained using vg before, but I would like to know what is the " graph.xg" term in the command.
Is it the original human graph or is it the spliced-junction graph that I obtained before using vg?
I would appreciate your help
Best

jeizenga · 2024-09-07T20:15:09Z

You would use the spliced graph.

CarlosAmadeo7 · 2024-09-12T22:28:41Z

It worked perfectly, thank you.
I am new in rpvg but I just wanted to know something general. After obtaining the expression haplotypes, what following studies you would recommend?

jeizenga · 2024-09-13T03:38:10Z

That's an area where I think there's still some room for methods development. If you marginalize out the haplotypes, you can get transcript-level expression values that fit into standard pipelines. We also found that for reasonably highly expressed genes, you can typically resolve the sample's haplotypes and their expression values as the top 2 most confident haplotypes. I think we have yet to see an analysis that fully utilizes all of the uncertainty over haplotypes that is included in the posterior distribution that rpvg estimates.

agolicz · 2024-09-18T08:47:19Z

Just following up on this. Do you think it's best to use two most confident haplotypes (rpvg.txt) or the most confident diplotype (rpvg_joint.txt). So far we've been using the most confident diplotype. I am re-running the analysis with two most confident haplotypes, but so far getting pretty similar results.
We've been using rpvg quantification for eQTL analysis. This is an old version, but gives an idea: https://www.biorxiv.org/content/10.1101/2024.03.14.585028v1.abstract

Agnieszka

jeizenga · 2024-09-18T16:01:20Z

The diplotype is what I meant. Sorry for the ambiguity. It makes sense that you get generally similar results, since the most likely diplotype will typically consist of the two most likely haplotypes.

CarlosAmadeo7 · 2024-09-18T18:23:18Z

Thank you for the response. I would also like to know if you guys have a page for interpretation of the results. I can understand most of the results, but I am a little confused about the Cluster ID meaning

jeizenga · 2024-09-18T19:16:15Z

We do not have an interpretation page, although that would be a good idea.

A cluster is roughly analogous to a gene. In particular, transcripts are joined into a cluster whenever there is a read that is simultaneously aligned to both of them. This typically means the transcripts have an overlapping exon. However, it's possible that transcripts that are considered to be from the same gene do not get placed into a cluster, which would occur when their overlapped exons are not expressed.

We deal with clusters rather than genes internally to avoid the squishy conceptual ambiguities around gene definitions in complex regions. Clusters can reasonably well identify groups of transcripts whose expression needs to be co-inferred.

agolicz · 2024-09-19T18:26:08Z

Yes, that would be fantastic. For us graph based expression quantification shapes up to be one of the main current applications other than SV genotyping. Is this an area where vg team sees future developments?

CarlosAmadeo7 · 2024-09-23T18:37:23Z

Thank you, would you recommend to do a marginalization of the haplotypes from the diplotype_joint output to get transcript-level expression values and then try to run a differential expression analysis ? In what cases should I use the single haplotype expression file and the diplotype expression file?
Thank you , I appreciate your patience and help

CarlosAmadeo7 · 2024-10-03T02:11:25Z

Hello rpvg team,
Following my previous question.
I was able to map RNA seq reads and get the haplotype-transcript path expression of the reads using rpvg. It gave me 994002 haplotype-specific transcripts. I am using the spliced junction and human pantranscriptome from the paper: A Draft Human Pangenome Reference.
I am trying to compare my results ( single haplotypes and the most possible haplotype combination(diplotype) ) with the results from an article where they found 51135 genes and their respective counts.
I am not quite sure how to do that because what I have is haplotype-transcript expression and the number is way higher compared to what they showed.
Previously, you mentioned that marginalization of the haplotypes can give me transcript-level expression values that fit into standard pipelines. By marginalization do you mean averaging all the haplotype combinations, in the rpvg_joint.txt, and then summing up all the ReadCounts and TPMs? I would appreciate more information about this, please.
In addition, I noticed that in my results sometimes the same haplotype is repeated in both locations (Name1 and Name2), what is the meaning of this? In that case, should I sum their ReadCounts and TPM for possible marginalization?

I would appreciate information about it.
Thank you so much
Best

jeizenga · 2024-11-19T03:40:29Z

To marginalize over haplotypes, you would essentially compute an average of the sum of TPM_1 and TPM_2 weighted byHaplotypingProbability across all transcripts with the same accession ID.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about usage of rpvg command #62

Question about usage of rpvg command #62

CarlosAmadeo7 commented Sep 7, 2024

jeizenga commented Sep 7, 2024

CarlosAmadeo7 commented Sep 12, 2024

jeizenga commented Sep 13, 2024

agolicz commented Sep 18, 2024

jeizenga commented Sep 18, 2024

CarlosAmadeo7 commented Sep 18, 2024

jeizenga commented Sep 18, 2024

agolicz commented Sep 19, 2024

CarlosAmadeo7 commented Sep 23, 2024

CarlosAmadeo7 commented Oct 3, 2024

jeizenga commented Nov 19, 2024

Question about usage of rpvg command #62

Question about usage of rpvg command #62

Comments

CarlosAmadeo7 commented Sep 7, 2024

jeizenga commented Sep 7, 2024

CarlosAmadeo7 commented Sep 12, 2024

jeizenga commented Sep 13, 2024

agolicz commented Sep 18, 2024

jeizenga commented Sep 18, 2024

CarlosAmadeo7 commented Sep 18, 2024

jeizenga commented Sep 18, 2024

agolicz commented Sep 19, 2024

CarlosAmadeo7 commented Sep 23, 2024

CarlosAmadeo7 commented Oct 3, 2024

jeizenga commented Nov 19, 2024