Alevin getting stuck on barcode processing #333

mariaolaaksonen · 2019-01-08T11:34:13Z

I'm trying to process the 10X 1.3 Million Brain Cells from E18 Mice dataset using Alevin with compiled salmon version 0.12.0 using the gencode.vM19.pc_transcripts.fa.gz as reference (https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons). The chemistry used is the 10x-v2. I have divided the fastqs into the 133 libraries and I'm trying to run Alevin per library fastqs (~140 r1 fastqs per library). The dataset has been processed with the longranger demux program, which outputs one fastq with both the UMI+barcode and read-sequence. I have divided the fastqs so that it corresponds to the input of Alevin (i.e. the UMI+barcode in one fastq and the read-sequence in the other). However it seems that Alevin gets stuck on processing the barcodes, no error code is produced it just doesn't seem to do anything anymore with just "processed X Million barcodes" printed on the screen. Are you aware of such a problem with many fastq files or is there something that I'm not taking into account? Is there a limit how many files can be used as an input? I tested Alevin with 60 fastqs (120 in total r1+r2 fastqs) and it ran through but with more than 60 fastqs it seems to get stuck on processing the barcodes. If it is not possible to run all the library related fastqs, do you recommend running them in smaller batches and then combining the resulting count matrices?

Command used: salmon alevin -l ISR -1 R1_fastqs -2 R2_fastqs --chromium -i index -p 20 -o alevin_output --tgMap txp2gene_mouse.tsv --dumpCsvCounts --whitelist barcode_whitelist.txt --minScoreFraction 0.7

The barcode whitelist was gotten from the HDF5 file which has the original data in a filtered matrix format (it has been run through the cellranger).

k3yavi · 2019-01-08T23:36:30Z

HI @mariaolaaksonen ,
Thanks for raising the issue and using Alevin with 1.3M dataset.
Can you check if your issue has the same behavior as in #329, i.e. Alevin is stuck after processing a multiple of 4 number of barcodes?
We have already fixed the issue but it's not in the master or in the release v0.12.0 of salmon.

As a fast resolution, we'd recommend compiling salmon from source using the develop branch. If you can wait for sometime, we'd release a new version with the hot-fix soon.

mariaolaaksonen · 2019-01-09T12:47:29Z

Thanks for the quick reply! The development branch seems to work for the dataset as expected, so the problem probably was related to #329.

k3yavi self-assigned this Jan 8, 2019

k3yavi added the alevin issue is primarily related to alevin label Jan 8, 2019

mariaolaaksonen closed this as completed Jan 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alevin getting stuck on barcode processing #333

Alevin getting stuck on barcode processing #333

mariaolaaksonen commented Jan 8, 2019

k3yavi commented Jan 8, 2019

mariaolaaksonen commented Jan 9, 2019

Alevin getting stuck on barcode processing #333

Alevin getting stuck on barcode processing #333

Comments

mariaolaaksonen commented Jan 8, 2019

k3yavi commented Jan 8, 2019

mariaolaaksonen commented Jan 9, 2019