Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alevin getting stuck on barcode processing #333

Closed
mariaolaaksonen opened this issue Jan 8, 2019 · 2 comments
Closed

Alevin getting stuck on barcode processing #333

mariaolaaksonen opened this issue Jan 8, 2019 · 2 comments
Assignees
Labels
alevin issue is primarily related to alevin

Comments

@mariaolaaksonen
Copy link

I'm trying to process the 10X 1.3 Million Brain Cells from E18 Mice dataset using Alevin with compiled salmon version 0.12.0 using the gencode.vM19.pc_transcripts.fa.gz as reference (https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.3.0/1M_neurons). The chemistry used is the 10x-v2. I have divided the fastqs into the 133 libraries and I'm trying to run Alevin per library fastqs (~140 r1 fastqs per library). The dataset has been processed with the longranger demux program, which outputs one fastq with both the UMI+barcode and read-sequence. I have divided the fastqs so that it corresponds to the input of Alevin (i.e. the UMI+barcode in one fastq and the read-sequence in the other). However it seems that Alevin gets stuck on processing the barcodes, no error code is produced it just doesn't seem to do anything anymore with just "processed X Million barcodes" printed on the screen. Are you aware of such a problem with many fastq files or is there something that I'm not taking into account? Is there a limit how many files can be used as an input? I tested Alevin with 60 fastqs (120 in total r1+r2 fastqs) and it ran through but with more than 60 fastqs it seems to get stuck on processing the barcodes. If it is not possible to run all the library related fastqs, do you recommend running them in smaller batches and then combining the resulting count matrices?

Command used: salmon alevin -l ISR -1 R1_fastqs -2 R2_fastqs --chromium -i index -p 20 -o alevin_output --tgMap txp2gene_mouse.tsv --dumpCsvCounts --whitelist barcode_whitelist.txt --minScoreFraction 0.7

The barcode whitelist was gotten from the HDF5 file which has the original data in a filtered matrix format (it has been run through the cellranger).

@k3yavi k3yavi self-assigned this Jan 8, 2019
@k3yavi k3yavi added the alevin issue is primarily related to alevin label Jan 8, 2019
@k3yavi
Copy link
Member

k3yavi commented Jan 8, 2019

HI @mariaolaaksonen ,
Thanks for raising the issue and using Alevin with 1.3M dataset.
Can you check if your issue has the same behavior as in #329, i.e. Alevin is stuck after processing a multiple of 4 number of barcodes?
We have already fixed the issue but it's not in the master or in the release v0.12.0 of salmon.

As a fast resolution, we'd recommend compiling salmon from source using the develop branch. If you can wait for sometime, we'd release a new version with the hot-fix soon.

@mariaolaaksonen
Copy link
Author

Thanks for the quick reply! The development branch seems to work for the dataset as expected, so the problem probably was related to #329.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alevin issue is primarily related to alevin
Projects
None yet
Development

No branches or pull requests

2 participants