You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to run Igor on my T cell receptor beta chain sequences, and everything works great until my sample size is above 100,000 sequences.
I'm getting the following error when using -evaluate command:
[IGoR] ERROR: Exception caught while reading J alignments before inference/evaluation. Make sure alignments were carried previously using "-align --J" or "-align --all" with similar path parameters (working directory, batchname, ...)
I have done -align -all, just like i did for all my other samples, and the the J_alignments file was generated in the aligns folder and looks fine. I tried splitting the sample in 4 files and doing all separately, which worked perfectly, so it shouldn't be a problem with the sequences. It is only when I use the whole file that I get that error.
Do you have any advise on how to go around this, or are there limitations with file sizes?
Thanks,
Kristina
The text was updated successfully, but these errors were encountered:
Hello @kgrigaityte ,
For now IGoR is loading all alignments in memory and store them there, I guess this strategy problematic upon running over large alignment files. You should have a second line in the error message giving you the error type. Could you please paste the complete error message (or just edit your post with the complete error message) ?
There is a tradeoff between having to browse a large alignment file for every sequence on the fly (use virtually no memory but imposes to parse the complete file for each sequence) and storing every alignment in memory (uses a lot of memory and only parse the alignment file once).
In order to reduce memory usage there are two paths you could exploit:
have a more drastic filtering on alignments upon aligning or reading alignments, by playing with alignment score thresholds or relative score thresholds (although now that I think about it I am not sure I have created a command line option for the latter yet).
try and shorten your gene names (if you're using the IMGT complete name, the string will take up a lot of memory compared to a shorter name). This may sound silly but may be a real problem for large sequence sets.
I'm a bit busy at the moment but I'll try and spend some time find a better tradeoff in terms of input reading for large dataset once I get some time
Hope this helps!
Hello,
I'm trying to run Igor on my T cell receptor beta chain sequences, and everything works great until my sample size is above 100,000 sequences.
I'm getting the following error when using -evaluate command:
[IGoR] ERROR: Exception caught while reading J alignments before inference/evaluation. Make sure alignments were carried previously using "-align --J" or "-align --all" with similar path parameters (working directory, batchname, ...)
I have done -align -all, just like i did for all my other samples, and the the J_alignments file was generated in the aligns folder and looks fine. I tried splitting the sample in 4 files and doing all separately, which worked perfectly, so it shouldn't be a problem with the sequences. It is only when I use the whole file that I get that error.
Do you have any advise on how to go around this, or are there limitations with file sizes?
Thanks,
Kristina
The text was updated successfully, but these errors were encountered: