Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kb count ERROR with inDrops3 data #63

Closed
naumenko-sa opened this issue Mar 27, 2020 · 2 comments
Closed

kb count ERROR with inDrops3 data #63

naumenko-sa opened this issue Mar 27, 2020 · 2 comments

Comments

@naumenko-sa
Copy link

Hello!
Thanks for developing and supporting kb_python!

Describe the issue
I'm running this workflow with inDrops3 data. This thread helped me to clarify how to use inDrops3 input. I'm running a test set of 87,347,261 reads.

  1. I installed R4.0
  2. Generated reference files (using 61bp for inDrops3 transcript read length)
library(BUSpaRse)
library(BSgenome.Mmusculus.UCSC.mm10)
library(AnnotationHub)

ah <- AnnotationHub()
query(ah, pattern = c("Ensembl", "97", "Mus musculus", "EnsDb"))
# Get mouse Ensembl 97 annotation
edb <- ah[["AH73905"]]
# note L = read lengths for transcript read, L=61 for indrops3/61
get_velocity_files(edb, L=61, Genome=BSgenome.Mmusculus.UCSC.mm10, out_path = "./veloindex", isoform_action = "separate")
  1. Indexed the reference with kallisto:
    kallisto index -i mm_cDNA_introns_97.idx cDNA_introns.fa

the next step kb count fails

What is the exact command that was run?

kreference_prefix=/path/veloindex_indrops3

# 100G RAM max
kb count \
-i ${kreference_prefix}/mm_cDNA_introns_97.idx \
-g ${kreference_prefix}/neuron10k_velocity/tr2g.tsv \
-x INDROPSV3 \
-o kallisto_bus_output \
-c1 ${kreference_prefix}/cDNA_tx_to_capture.txt \
-c2 ${kreference_prefix}/introns_tx_to_capture.txt \
--lamanno \
--verbose \
-t 8 \
${1}_2.fq.gz ${1}_4.fq.gz ${1}_1.fq.gz

fastq input:

  • 2.fq.gz - 8bp - first part of CELL id
  • 4.fq.gz - 14bp = 8 bp - second part of CELL id + 6 bp of transcript UMI
  • 1.fq.gz - 61bp transcript read

Command output (with --verbose flag)

/home/sn240/.conda/envs/r/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
[2020-03-26 21:41:36,087]   DEBUG Printing verbose output
[2020-03-26 21:41:36,087]   DEBUG Creating tmp directory
[2020-03-26 21:41:36,103]   DEBUG Namespace(c1='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/cDNA_tx_to_capture.txt', c2='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/introns_tx_to_capture.txt', command='count', fastqs=['test_2.fq.gz', 'test_4.fq.gz', 'test_1.fq.gz'], filter=None, g='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv', h5ad=False, i='/n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/mm_cDNA_introns_97.idx', keep_tmp=False, lamanno=True, list=False, loom=False, m='4G', nucleus=False, o='kallisto_bus_output', overwrite=False, t=8, verbose=True, w=None, x='INDROPSV3')
[2020-03-26 21:41:36,104]    INFO Generating BUS file from
[2020-03-26 21:41:36,104]    INFO         test_2.fq.gz
[2020-03-26 21:41:36,104]    INFO         test_4.fq.gz
[2020-03-26 21:41:36,104]    INFO         test_1.fq.gz
[2020-03-26 21:41:36,106]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/mm_cDNA_introns_97.idx -o kallisto_bus_output -x INDROPSV3 -t 8 test_2.fq.gz test_4.fq.gz test_1.fq.gz
[2020-03-26 21:51:04,483]   DEBUG 
[2020-03-26 21:51:04,483]   DEBUG [index] k-mer length: 31
[2020-03-26 21:51:04,483]   DEBUG [index] number of targets: 838,802
[2020-03-26 21:51:04,483]   DEBUG [index] number of k-mers: 1,112,521,288
[2020-03-26 21:51:04,484]   DEBUG [index] number of equivalence classes: 5,715,566
[2020-03-26 21:51:04,484]   DEBUG [quant] will process sample 1: test_2.fq.gz
[2020-03-26 21:51:04,484]   DEBUG test_4.fq.gz
[2020-03-26 21:51:04,484]   DEBUG test_1.fq.gz
[2020-03-26 21:51:04,484]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2020-03-26 21:51:04,484]   DEBUG [quant] processed 131,020,444 reads, 69,153,075 reads pseudoaligned
[2020-03-26 21:51:04,484]    INFO Sorting BUS file kallisto_bus_output/output.bus to tmp/output.s.bus
[2020-03-26 21:51:04,488]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o tmp/output.s.bus -T tmp -t 8 -m 4G kallisto_bus_output/output.bus
[2020-03-26 21:51:29,622]   DEBUG Read in 69153075 BUS records
[2020-03-26 21:51:29,623]    INFO Whitelist not provided
[2020-03-26 21:51:29,623]    INFO Copying pre-packaged INDROPSV3 whitelist to kallisto_bus_output
[2020-03-26 21:51:29,722]    INFO Inspecting BUS file tmp/output.s.bus
[2020-03-26 21:51:29,723]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools inspect -o kallisto_bus_output/inspect.json -w kallisto_bus_output/inDropsv3_whitelist.txt -e kallisto_bus_output/matrix.ec tmp/output.s.bus
[2020-03-26 21:52:05,442]    INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist kallisto_bus_output/inDropsv3_whitelist.txt
[2020-03-26 21:52:05,445]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools correct -o tmp/output.s.c.bus -w kallisto_bus_output/inDropsv3_whitelist.txt tmp/output.s.bus
[2020-03-26 21:52:13,340]   DEBUG Found 147456 barcodes in the whitelist
[2020-03-26 21:52:13,341]   DEBUG Number of hamming dist 1 barcodes = 6872832
[2020-03-26 21:52:13,341]   DEBUG Processed 25625292 bus records
[2020-03-26 21:52:13,341]   DEBUG In whitelist = 21308242
[2020-03-26 21:52:13,341]   DEBUG Corrected = 1643761
[2020-03-26 21:52:13,341]   DEBUG Uncorrected = 2673289
[2020-03-26 21:52:13,341]    INFO Sorting BUS file tmp/output.s.c.bus to kallisto_bus_output/output.unfiltered.bus
[2020-03-26 21:52:13,342]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o kallisto_bus_output/output.unfiltered.bus -T tmp -t 8 -m 4G tmp/output.s.c.bus
[2020-03-26 21:52:26,062]   DEBUG Read in 22952003 BUS records
[2020-03-26 21:52:26,073]    INFO Capturing records from BUS file kallisto_bus_output/output.unfiltered.bus to tmp/spliced.bus with capture list /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/cDNA_tx_to_capture.txt
[2020-03-26 21:52:26,074]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools capture -o tmp/spliced.bus -c /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/cDNA_tx_to_capture.txt -e kallisto_bus_output/matrix.ec -t kallisto_bus_output/transcripts.txt --transcripts kallisto_bus_output/output.unfiltered.bus
[2020-03-26 21:53:10,562]   DEBUG Parsing transcripts .. done
[2020-03-26 21:53:10,563]   DEBUG Parsing ECs .. done
[2020-03-26 21:53:10,563]   DEBUG Parsing capture list .. done
[2020-03-26 21:53:10,563]   DEBUG Read in 22054602 BUS records, wrote 15358258 BUS records
[2020-03-26 21:53:10,563]    INFO Sorting BUS file tmp/spliced.bus to kallisto_bus_output/spliced.unfiltered.bus
[2020-03-26 21:53:10,566]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools sort -o kallisto_bus_output/spliced.unfiltered.bus -T tmp -t 8 -m 4G tmp/spliced.bus
[2020-03-26 21:53:19,313]   DEBUG Read in 15358258 BUS records
[2020-03-26 21:53:19,313]    INFO Generating count matrix kallisto_bus_output/counts_unfiltered/spliced from BUS file kallisto_bus_output/spliced.unfiltered.bus
[2020-03-26 21:53:19,313]   DEBUG /home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools count -o kallisto_bus_output/counts_unfiltered/spliced -g /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv -e kallisto_bus_output/matrix.ec -t kallisto_bus_output/transcripts.txt --genecounts kallisto_bus_output/spliced.unfiltered.bus
[2020-03-26 21:53:19,340]   DEBUG Usage: bustools count [options] sorted-bus-files
[2020-03-26 21:53:19,340]   DEBUG 
[2020-03-26 21:53:19,340]   DEBUG Options:
[2020-03-26 21:53:19,340]   DEBUG -o, --output          File for corrected bus output
[2020-03-26 21:53:19,340]   DEBUG -g, --genemap         File for mapping transcripts to genes
[2020-03-26 21:53:19,340]   DEBUG -e, --ecmap           File for mapping equivalence classes to transcripts
[2020-03-26 21:53:19,341]   DEBUG -t, --txnames         File with names of transcripts
[2020-03-26 21:53:19,341]   DEBUG --genecounts          Aggregate counts to genes only
[2020-03-26 21:53:19,341]   DEBUG -m, --multimapping    Include bus records that pseudoalign to multiple genes
[2020-03-26 21:53:19,341]   DEBUG 
[2020-03-26 21:53:19,341]   DEBUG Error: File not found /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv
[2020-03-26 21:53:19,348]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/main.py", line 483, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/main.py", line 135, in parse_count
    nucleus=args.nucleus,
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/count.py", line 746, in count_velocity
    bus_result['txnames'],
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/count.py", line 181, in bustools_count
    run_executable(command)
  File "/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/utils.py", line 147, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/sn240/.conda/envs/r/lib/python3.7/site-packages/kb_python/bins/linux/bustools/bustools count -o kallisto_bus_output/counts_unfiltered/spliced -g /n/data1/cores/bcbio/naumenko/velocity_test/veloindex_indrops3/neuron10k_velocity/tr2g.tsv -e kallisto_bus_output/matrix.ec -t kallisto_bus_output/transcripts.txt --genecounts kallisto_bus_output/spliced.unfiltered.bus' returned non-zero exit status 1.
[2020-03-26 21:53:19,353]   DEBUG Removing tmp directory

Versions used:

  • kallisto 0.46.0
  • bustools 0.39.4
  • kb_python 0.24.4

I'd appreciate any help to push this analysis forward!

Sergey

@naumenko-sa
Copy link
Author

figured out: wrong path to -g

@haiderabbas678
Copy link

mm_cDNA_introns_97.idx
Is this file available online? if not how do i generate it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants