You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IGoR seems to be routinely over-estimating the number of TRAV/TRBV gene deletions (and therefore presumably is correspondingly doing less well in other parameters too).
Additional context
When running IGoR on some sequences from a recent human alpha/beta TCRseq experiment I noticed that it was predicting far more V gene deletions than were parsimonious, i.e. a TCR that matched the germline sequence perfectly up until n nt away from the 3' end of the V was being predicted to have >n deletions, often more than double.
In order to see if this was just some weird artifact of this handful of TCRs I used a bunch of simulated TCRs (produced using ImmuneSIM) that I'd generated previously (available in this repo), giving me clean TCRs with known numbers of deletions. I then went through and picked out TCRs from a bunch of different V genes, picked 10 random TCR sequences from each number of V deletions (in the range 0-15). I then ran IGoR on those, comparing the average predicted number of deletions from the top 100 scenarios, and compared that against the true value.
I've plotted the results of a few of the genes here. In my hands it looks like IGoR almost never predicts high scoring rearrangements with very few deletions. Its predictions are clearly correlated with the real value, just offset by ~5/8 nt for TRB/TRA respectively.
Assuming this repeats (and isn't an artifact of my analysis) it's a bit concerning. I appreciate that for many applications the exact number of Vdel doesn't matter, but presumably it's still going to be having an impact on Pgen calculation. I've also not tested it on other loci/species, so I don't know how widespread the implications might be.
System information:
IGoR version 1.4.0
Ubuntu 18.04
gcc version 7.5.0
Additional context
For reference, my IGoR code (run on a file of TCRs from a given V gene, covering a range of deletion lengths) looks like this:
igor -set_wd . -batch foo -read_seqs sorted-b-TCRs.txt
igor -set_wd . -batch foo -species human -chain beta -align --all
igor -set_wd . -batch foo -species human -chain beta -evaluate -output --scenarios 100
And here's an example top 20 lines of one of those sorted-b-TCRs.txt files:
The issue
IGoR seems to be routinely over-estimating the number of TRAV/TRBV gene deletions (and therefore presumably is correspondingly doing less well in other parameters too).
Additional context
When running IGoR on some sequences from a recent human alpha/beta TCRseq experiment I noticed that it was predicting far more V gene deletions than were parsimonious, i.e. a TCR that matched the germline sequence perfectly up until n nt away from the 3' end of the V was being predicted to have >n deletions, often more than double.
In order to see if this was just some weird artifact of this handful of TCRs I used a bunch of simulated TCRs (produced using ImmuneSIM) that I'd generated previously (available in this repo), giving me clean TCRs with known numbers of deletions. I then went through and picked out TCRs from a bunch of different V genes, picked 10 random TCR sequences from each number of V deletions (in the range 0-15). I then ran IGoR on those, comparing the average predicted number of deletions from the top 100 scenarios, and compared that against the true value.
I've plotted the results of a few of the genes here. In my hands it looks like IGoR almost never predicts high scoring rearrangements with very few deletions. Its predictions are clearly correlated with the real value, just offset by ~5/8 nt for TRB/TRA respectively.
Assuming this repeats (and isn't an artifact of my analysis) it's a bit concerning. I appreciate that for many applications the exact number of Vdel doesn't matter, but presumably it's still going to be having an impact on Pgen calculation. I've also not tested it on other loci/species, so I don't know how widespread the implications might be.
System information:
Additional context
For reference, my IGoR code (run on a file of TCRs from a given V gene, covering a range of deletion lengths) looks like this:
And here's an example top 20 lines of one of those sorted-b-TCRs.txt files:
The text was updated successfully, but these errors were encountered: