-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for cell barcodes which are longer than 16 nucleotides. #588
Add support for cell barcodes which are longer than 16 nucleotides. #588
Conversation
5cdf9bf
to
35048f8
Compare
Add support for cell barcodes which are longer than 16 nucleotides. The old code overwrote the first nucleotides of the cell barcodes with As for position 1 till (length cell barcode - 16) as cell barcodes where converted with "convertNuclStrToInt32" and converted back to "convertNuclInt32toString" and this would cause an overflow. The number of barcodes is still limited to 2^32 after this patch. UMIs are still limited to 16 nucleotides. Longer UMIs could be supported in a similar way.
Rebased to avoid the whitespace issues which were fixed recently. |
Hi Gert, I just released the branch I was working on with new features for STARsolo: It has an option of working without the whitelist, which might be useful for custom barcodes. I will now work on pulling in your requests. Cheers |
Hi Alex, What is new in that branch? Is is only that a white list is not needed anymore, or are there other changes? It would also be useful for us if #592 is merged (mostly for normal STAR with ATAC data). We had a new run with a custom design with barcodes of 20bp + 8 bp UMI, but this time other samples (not single-cell) were added which required a longer read 2, so the number of cycles was higher than needed for BC + UMI for read 2 (more based than 20 + 8 + 28). So it would be nice in the future if there was a way to specify where the barcode an UMI are located. Maybe a syntax similar to Illumina bc2fastq might be an option, instead of needing all those long options:
Specifying |
Hi Gert, I have pulled in this request into the new branch, and verified that it works for 16b CB in my test runs. Another major feature in this branch is the ability to count all reads overlapping a gene - i.e. include the reads coming from pre-mRNA, in addition to the standard counting of reads that agree with the spliced transcripts. This is activated with It's a good idea to have a string encoding the CB and UMI position, also useful for other technologies like inDrop or sci-RNA-seq or Split-seq. I will pull in the #592 shortly. Cheers |
@alexdobin Thanks for merging. I think your no whitelist patch is not completely compatible with this branch. It works find as far as I can tell when a whilelist is provided.
I guess the problem comes from this code in
|
Hi Gert, please check the latest commit on this branch, it should solve the problem. Thanks a lot! |
I get a segmentation fault:
In dmesg:
I had also an segmentation fault when I tried to fix this bug myself before. |
Hi Gert, Thanks a lot for testing it! Cheers |
Add support for cell barcodes which are longer than 16 nucleotides.
The old code overwrote the first nucleotides of the cell barcodes
with As for position 1 till (length cell barcode - 16) as cell
barcodes where converted with "convertNuclStrToInt32" and converted
back to "convertNuclInt32toString" and this would cause an overflow.
The number of barcodes is still limited to 2^32 after this patch.
UMIs are still limited to 16 nucleotides. Longer UMIs could be
supported in a similar way.