Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discriminative variants with GTbarcode #92

Open
flde opened this issue Mar 4, 2024 · 4 comments
Open

Discriminative variants with GTbarcode #92

flde opened this issue Mar 4, 2024 · 4 comments

Comments

@flde
Copy link

flde commented Mar 4, 2024

Dear all,

Many thanks for the great tool and help on GitHub. I run Vireo and the results looks very promising (single cell RNAseq | mode 1a with genome 1k).

I am now wondering which variants actually support the discrimination between donor0 and donor1. I run GTbarcode but only get one result which seems not enough. However, I can't find any help/documentation about the GTbarcode function or the vireo vcf output file - which would be helful too.

Do you have recommendations about that? Or would it be possible to parse the vireo vcf with other tools to rank variants?

Many thanks for your help!
Florian

@huangyh09
Copy link
Collaborator

Hi Florian,

Very good question. In principle, GTbarcode offers a solution for this demand, while it aims to minimize the number of variants selected as long as it can sufficiently discriminate the donors. Understandably, more variants may be wanted. One simple way is to change the --randSeed, so each time it will give you different variants with the same information gain.

Alternatively, one can output more (or all) variants for each equivalent group by changing this line (currently it only sample one for output):
https://github.com/single-cell-genetics/vireo/blob/master/vireoSNP/utils/variant_select.py#L53

Yuanhua

@flde
Copy link
Author

flde commented Mar 5, 2024

Dear Yuanhua,

Many thanks for your help! I set the following line to return idx instead of idx_use and adusted the file output writer. I get a list of variants now and I think there entropy equals max entropy.

However, can you help me please with a few concepts? When you start the while loop of variant_select the entropy is computed per variant. You break the loop if the max entropy of one iteration is smaller than the max entropy of the prvious iteration. But since the entropy is compute for each variant individually that loop will always breatk after one iteration, right?

A bit off topic, but I also used the vatrix format as vireo input but that does not yield a GT_donors.vireo.vcf.gz file which I could use for GTbarcode. Is there a script/tool to create such file manually from the output? That would be great.

Many thanks and best wishes,
Florian

@huangyh09
Copy link
Collaborator

Hi Florian, Nice fix!

For the "while loop", yes the entropy is computed for each variant, but on top of the selected variants as this line. So, for each iteration, the entropy is guaranteed to not decrease and adding more variants can't increase the entropy compared to the previous iteration, the while loop will be stopped.

Good to hear that you are using the vatrix format as input. Did you input 3 or 4 files? If using 4 files (the last one is SNPs.vcf.gz), then it can output the GT_donors.vireo.vcf.gz.

Yuanhua

@flde
Copy link
Author

flde commented Mar 6, 2024

Hello Yuanhua, many thanks!

For completion, I forked the repository and documented documented all changes.

To be honest, I still don't understand the while loop. You wrote, each iteration will add more variants. But as I understand it you actually remove all variants with entropy < max_entropy.

I did not use the SNPs.vcf.gz file. And I think I can't recover the GT_donors.vireo.vcf.gz because the information of the GT_prob and ID_prob from the gt_results can't be recovered from the vireo output files?

Best wishes, Florian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants