-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inDrops barcode list #4
Comments
Also, when I made the list of 147,456 barcodes, by making a new .txt file with all possible combinations of "half"-barcodes By doing this: I got the following error message: Found 147456 barcodes in the whitelist Ps.: I used the term |
Hi! Glad you like the tools. Currently the "correct" command doesn't work with variable length barcodes so your best bet is to skip the correct step. It is something we will work on in the future. All other steps should work just fine. I will close this issue. If something else arises, feel free to open it back up. |
Ok after discussing with Páll, we came up with a slight trick that may work: Append an "A" to the beginning of each barcode until the length of the barcode is the length of the longest barcode, then do barcode error correction. Here is the "whitelist" with the above modification: Try out I'm curious to know if this works! |
Hi Sina! I really appreciate the attention in this thread. So!
My output for this was a dataset with very few barcodes passing correction (~20), and further analysis with BUSpaRse weren't possible.
Thank you! I really appreciate all the tools developed in your lab! |
Ok so I went through the kallisto bus code and currently kallisto bus will interpret the four fastq files associated with v3 inDrops as two separate sample files, as is the case for v1/v2. This will give you a BUS file with a huge number of barcodes and incorrect results.. this is not the desired behavior! So currently kallisto bus cannot be used for inDrops v3. We will get working on v3 and will update the binaries when everything is functional. |
Hi Sina, sorry for the late response! I was trying to figure out library issues. So, actually I was using a v2 of inDrops library preparation, with 2 runs per sample. What was interesting and I didn't know was that the inDrops total barcode has an adaptor sequence (that is constant GAGTGATTGCTTGTGACGCCTT) between the 2 "half"-barcode sequences (see image). The length of barcodes vary between 38-41. I added this adaptor sequence between the two half-barcodes as the following: And when I applied the correction step by:
I got the following error:
Does the whitelist has to necessarily have 19 nucleotides for the correction step? |
Hi Camila, I would probably run kallisto bus as follows: I think the problem is that maybe the authors used fastq files where the bridging adaptor was already removed (I am probably wrong though) so they would expect the format of barcodes to be max 19 bp. Let me know if this works |
Hi! Actually these are fastq from my lab, and I am working on them as is from the sequencing. So the adaptors must be there still. The thing is that we recently acquired a InDrops encapsulator, we had the sequencer, and we are using the pipeline developed by the company all in python, which I am not pro in (I am a R person). kallisto/bustools called my attention because it is in R and I use kalisto for a long time now to work in bulk RNAseq files and teach transcriptomics. Just recently I started to work with single-cell RNAseq. I appreciate the reply! Thank you! |
Hi, I just wanted to come back and see if the team was still actively developing an option for indrops v3 ? I am stuck unfortunately at the moment, I have tried to do the following: Then custom demultiplex with barcode_splitter: And then ran kallisto bus: but end up having very low counts per barcode :(, so i am not sure what is going on (i have tried the conventional indrops pipeline, and it works fine for my files in terms of UMIs). Would you perhaps be able to share some information or help me out ?? Thanks !! |
I'm having the exact same issue as Camila with our inDrops v2 chemistry after running the pseudoalignment step with kallisto bus using the "inDrops" argument for the technology. Putting R1 first, as described by the tutorial, lead to a super low 100,000 reads aligned. So I swapped the order of R1 and R2 in the terminal command, due to R2 being the barcode and R1 being the transcript in the case of inDrops. This resulted in a very impressive 100 million aligned reads of 150 million reads processed (on an i7-4790 with 16GB RAM in about 5 minutes), further tests proceeded with these 100 million alignments. As I understand it, inDrops v2 barcode regions are structured as follows: [variable length barcode1][22-bp W1 sequence][8-bp barcode2][6-bp UMI] barcode 1 is taken from the pool found here and barcode 2 is taken from the pool found here. Attempting to correct with just the barcode2 whitelist leads to: Attempting to correct with the modified inDrops whitelist that Sina linked leads to very few corrections, just like Camila. Which inDrops barcode chemistries are directly supported by kallisto bus as well as the downstream correction steps? |
We have now finalized the workflow for processing inDrops v3, and have wrapped it into kb. |
Could I please ask if you might be able to help me to understand how to process inDrops v3? Specifically, I'm wondering if kb can process inDrops v3 with 4 fastq files. Thank you for any advice! I have 4 fastq files, as described on the indrops page:
From the indrops page, I can see:
Here is another description of the 4 files from the Harvard Bioinformatics Core: When I search for a relevant tutorial in the bus tools repo, this is what I find: This tutorial expects 2 files (R1 R2), not 4 (R1 R2 R3 R4). When I run
It looks to me like the |
@slowkow If I understand the setup correctly, bustools expects 3 files for inDrops version 3. You should have demultiplexed by the library index (i5) with File 0: this is the second index (i7) given by So your "R3" is not needed as it is for which sample was sequenced, not the cell barcodes. |
Can bustools now correct with variable barcode lengths for inDrops v2? |
Just to echo what many people have talked about before, I am still having issue using the Kallisto Bus pipeline for inDrops v2 data. I am using Kallisto 0.46.2, which specifies a inDropsv2 option. This allowed input files R1 and R2 in that order with R1 being cDNA and R2 being CB+UMI. However, the barcode list generated from the pipeline is still only 19bp for all, which doesn't adjust for the variable length of barcode. Therefore, I am still getting many fold more cells than expected. Thank you all for developing this amazing tool! Hope this issue can be resolved soon. |
Hey there, I was just wondering if kallisto bus producing bus files with higher number of barcodes were fixed for IndropsV3! Thanks |
Hi!
I have an inDrops dataset to preprocess and I am trying to run it using the recent bustools for scRNAseq. Unlike 10xGenomics, InDrops library V3 provides two list of barcodes - inDrops barcode lists where the first one is the beginning nucleotide sequence and the second one is the end part of the nucleotide sequence. Each list has 384 "half"-barcodes and the way that other preprocessing tools manage this is by remaking all possible combinations between the two list so we would have 384*384 = 147,456 barcodes, which is exact the number of barcodes that the inDrops platform claims to have.
I couldn't succeed in use the two .txt lists (instead one like the 10xv2_whitelist.txt from 10x platform) as input in the step of correcting the barcodes in the busfile.
bustools correct -w ../gel_barcode1_list.txt gel_barcode2_list.txt -o output.correct.bus output.bus
How can me manage that?
Finally, thank you Pachter Lab for this amazing tool that was recently released! I really appreciate it!
The text was updated successfully, but these errors were encountered: