clusters_tmp.tsv is empty which caused subprocess.CalledProcessError #128

ZebinWen · 2021-12-25T12:48:48Z

Hello,
When I was running souporcell_pipeline.py by setting the parameter -k 15 which could be a potential problem, I got a subporcess.CalledProcessError when it went to the function "doublets(args, ref_mtx, alt_mtx, cluster_file)". By the information of Traceback, I found that after the function "souporcell(args, ref_mtx, alt_mtx, final_vcf)" was run, the "cluster_file" which was named "clusters_tmp.tsv" in the output directory is empty. I think this might caused the function "doublets(args, ref_mtx, alt_mtx, cluster_file)" went wrong which need the "cluster_file" as input. But now I don't know how to deal with it.

running souporcell doublet detection
Traceback (most recent call last) :
File "/opt/souporcell/souporcell_pipeline.py", line 596, in
doublets(args, ref_mtx, alt_mtx, cluster_file)
File "/opt/souporcell/souporcell_pipeline.py", line 541, in doublets
subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err)
File "/usr/local/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', '/home/pro5/results/alt.mtx', '--refs', '/home/pro5/results/ref.mtx', '--clusters', '/home/pro5/results/clusters_tmp.tsv']' returned non-zero exit status 101.

By the way, when I checked the doublets.err file, I got something that I don't know what it means.

thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 111406', src/main.rs:328:22
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Thanks a lot!

The text was updated successfully, but these errors were encountered:

Slacanch · 2022-01-07T15:15:35Z

I am having a similar problem when running with K = 10, 11 or 12 on quite a large dataset (35G). I get the same subprocess error and doublets.err error as in the OP.

This seems to be due to the clustering steps failing to identify any clusters, yet failing without an error. this is the output of clusters.err:

total loci used 82114
thread 2 iteration 0 done with -inf, best so far -inf
thread 3 iteration 0 done with -inf, best so far -inf
thread 4 iteration 0 done with -inf, best so far -inf
thread 5 iteration 0 done with -inf, best so far -inf
thread 6 iteration 0 done with -inf, best so far -inf
thread 0 iteration 0 done with -inf, best so far -inf
thread 1 iteration 0 done with -inf, best so far -inf
thread 7 iteration 0 done with -inf, best so far -inf
thread 2 iteration 1 done with -inf, best so far -inf
thread 4 iteration 1 done with -inf, best so far -inf
thread 3 iteration 1 done with -inf, best so far -inf
thread 6 iteration 1 done with -inf, best so far -inf
thread 0 iteration 1 done with -inf, best so far -inf
thread 1 iteration 1 done with -inf, best so far -inf
thread 5 iteration 1 done with -inf, best so far -inf
thread 2 iteration 2 done with -inf, best so far -inf
thread 7 iteration 1 done with -inf, best so far -inf
thread 3 iteration 2 done with -inf, best so far -inf
thread 4 iteration 2 done with -inf, best so far -inf
thread 6 iteration 2 done with -inf, best so far -inf
thread 1 iteration 2 done with -inf, best so far -inf
thread 0 iteration 2 done with -inf, best so far -inf
thread 2 iteration 3 done with -inf, best so far -inf
thread 5 iteration 2 done with -inf, best so far -inf
thread 3 iteration 3 done with -inf, best so far -inf
thread 7 iteration 2 done with -inf, best so far -inf
thread 6 iteration 3 done with -inf, best so far -inf
thread 4 iteration 3 done with -inf, best so far -inf
thread 1 iteration 3 done with -inf, best so far -inf
thread 0 iteration 3 done with -inf, best so far -inf
thread 5 iteration 3 done with -inf, best so far -inf
thread 7 iteration 3 done with -inf, best so far -inf
best total log probability = -inf

I have tried increasing --restarts and changing --min_alt and --min_ref with no effect (although rerunning the whole pipeline might be required for these changes to take effect, right now i'm deleting the parts related to clusters and doublets and running it again on the same output folder).

is any of you aware of this behaviour? i can't begin to diagnose the problem as i don't understand why clustering fails.

thanks a lot

Tito

wheaton5 · 2022-01-07T18:36:01Z

It looks like there was some problem prior to clustering that led to 0 counts? That cant really be it because it says "total loci used 82k" but something similar. Maybe you could send me your alt.mtx and ref.mtx and barcodes.tsv file and then i can debug it.

Slacanch · 2022-01-10T16:36:49Z

Hey @wheaton5,
as an update, i've tried rerunning the pipeline completely with different min_alt, min_ref, and --restarts parameters to no effect.

I have sent you a link to download ref, alt, and barcodes for the data that gives me these issues, i've looked at them briefly but they seem ok.

Thanks for looking into it!

Slacanch · 2022-01-12T10:02:48Z

Hello again,
Something i forgot to mention, the dataset that is creating these clustering problems is not a straight up gene expression run, but rather a 10X genomics cellplexed sample run with cellranger multi, not cellranger count.

what that means in detail i'm not sure, but it might create problems. i'm going to try running cellranger count on only the gene expression portion of the data and then rerunning souporcell, see if that does anything.

Slacanch · 2022-01-13T11:04:44Z

Small update. after testing on the output of cellranger count instead of cellranger multi, souporcell run to completion. I think the problem was that i had provided ALL possible barcodes to SoC, instead of the ones found to be associated with cells. could this create problems for the clustering algorithm?

maybe @ZebinWen can comment on wether this applies to them.

wheaton5 · 2022-01-14T16:16:09Z

@Slacanch thanks for the update. This has been a problem in the past. I should try to fix it. Theoretically it is just adding some noise to the clustering. But something in the code makes it break.

changostraw · 2022-06-08T17:39:31Z

I think I am having the same issue. I get this error when clustering:
running souporcell clustering
/opt/souporcell/souporcell/target/release/souporcell -k 2 -a /project/rrg-gturecki/Sarah/Demuxafy/Soupercell/cecum_poolC/alt.mtx -r /project/rrg-gturecki/Sarah/Demuxafy/Soupercell/cecum_poolC/ref.mtx --restarts 100 -b /project/rrg-gturecki/Sarah/multiplex_test_demuxify_run2_cecum_poolC/outs/filtered_feature_bc_matrix/barcodes.tsv.gz --min_ref 10 --min_alt 10 --threads 2 --known_genotypes /project/rrg-gturecki/Sarah/Demuxafy/Soupercell/cecum_poolC/common_variants_covered.vcf --known_genotypes_sample_names 309_GT12139_SingleCellGenotyping_GT-P01_309 400_GT14352_SingleCellGenotyping_GT-P01_400
running souporcell doublet detection
Traceback (most recent call last):
File "/opt/souporcell/souporcell_pipeline.py", line 598, in
doublets(args, ref_mtx, alt_mtx, cluster_file)
File "/opt/souporcell/souporcell_pipeline.py", line 543, in doublets
subprocess.check_call([directory+"/troublet/target/release/troublet", "--alts", alt_mtx, "--refs", ref_mtx, "--clusters", cluster_file], stdout = dub, stderr = err)
File "/opt/conda/envs/py36/lib/python3.6/subprocess.py", line 311, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', '/project/rrg-gturecki/Sarah/Demuxafy/Soupercell/cecum_poolC/alt.mtx', '--refs', '/project/rrg-gturecki/Sarah/Demuxafy/Soupercell/cecum_poolC/ref.mtx', '--clusters', '/project/rrg-gturecki/Sarah/Demuxafy/Soupercell/cecum_poolC/clusters_tmp.tsv']' returned non-zero exit status 101.

I am only running k = 2 on one subset file for testing, but my data is also multiplexed. However, the input is only from cellranger count (I didn't run cellranger multi on this subset as I am testing the difference between demultiplexing via tags and via genotype). However, perhaps my barcodes file still contains all the barcodes? @Slacanch did you alter the barcodes file at all from the cellranger count output?

Thanks!

wheaton5 · 2022-06-08T20:58:19Z

you should use the barcodes file that is just the cell barcodes.

changostraw · 2022-06-08T20:58:49Z

Here is my clusters.err file. Is it a problem that it says total loci used 1?

total loci used 1
binomial 0 0 1 0 -80.59912 inf
binomial 1 0 1 0 -80.59912 inf
binomial 0 0 2 0 -23.148155 57.450966
binomial 1 0 2 0 -23.148155 57.450966
binomial 0 0 3 1 -23.14816 -0.000005722046
binomial 1 0 3 1 -23.14816 -0.000005722046
binomial 0 0 4 2 -23.14816 0
binomial 1 0 4 2 -23.14816 0
binomial 0 0 5 3 -23.14816 0
binomial 0 0 6 4 -23.14816 0
binomial 1 0 5 3 -23.14816 0
binomial 0 0 7 5 -23.14816 0
binomial 1 0 6 4 -23.14816 0
binomial 0 0 8 6 -23.14816 0
binomial 1 0 7 5 -23.14816 0
binomial 0 0 9 7 -23.14816 0
binomial 1 0 8 6 -23.14816 0
binomial 0 0 10 8 -23.14816 0
thread 0 iteration 0 done with -23.14816, best so far -23.14816
binomial 1 0 9 7 -23.14816 0
binomial 1 0 10 8 -23.14816 0
thread 1 iteration 0 done with -23.14816, best so far -23.14816
binomial 0 1 1 0 -80.59912 inf
binomial 0 1 2 0 -23.148155 57.450966
binomial 0 1 3 1 -23.14816 -0.000005722046
binomial 1 1 1 0 -80.59912 inf
binomial 1 1 2 0 -23.148155 57.450966
binomial 0 1 4 2 -23.14816 0

wheaton5 · 2022-06-08T20:59:31Z

yeah, thats definitely a problem lol.

wheaton5 · 2022-06-08T21:00:16Z

what do your alt.mtx and ref.mtx look like (top few lines)

changostraw · 2022-06-08T21:00:25Z

So it isn't finding any cells? Could that be because of the barcodes file?

changostraw · 2022-06-08T21:00:57Z

alt.mtx:
%%MatrixMarket matrix coordinate real general
% written by sprs
4832 12890 8825
1 51 0
1 9958 0
1 10740 0
2 782 1
3 1316 1
4 1356 0
7 1778 0
7 9498 0
8 3356 1
8 11915 1
8 12111 0
9 468 1
9 3360 1
9 5943 1
9 7077 1
9 10577 1
10 1366 0
10 5081 0
10 6506 0
11 1366 0
11 5081 0
11 6506 0
12 12366 0
13 2410 0
13 3837 0
14 566 0

changostraw · 2022-06-08T21:01:26Z

%%MatrixMarket matrix coordinate real general
% written by sprs
4832 12890 8825
1 51 1
1 9958 1
1 10740 1
2 782 0
3 1316 0
4 1356 1
7 1778 1
7 9498 1
8 3356 0
8 11915 0
8 12111 1
9 468 0
9 3360 0
9 5943 0
9 7077 0
9 10577 0
10 1366 1
10 5081 1
10 6506 1
11 1366 1
11 5081 1
11 6506 1
12 12366 1
13 2410 1
13 3837 1
14 566 1

wheaton5 · 2022-06-08T21:04:26Z

that looks relatively normal, but im going to look at a normal run real quick to make sure. Thats 12890 cells, 4832 loci, 8825 entries. Maybe that number of entries is low for that number of cells and loci

wheaton5 · 2022-06-08T21:06:55Z

What is the umi/cell median?

changostraw · 2022-06-08T21:10:01Z

These are postmortem intestine nuclei so there are low counts
Median UMI counts per cell = 230
Median genes per cell = 215
Mean reads per cell = 18,344

wheaton5 · 2022-06-08T21:11:30Z

You can try rerunning with --min_alt 4 and --min_ref 4. (default is 10 for each). You should get some more loci. But its gonna be tough to cluster with that few UMI

wheaton5 · 2022-06-08T21:14:07Z

here is an example of a run with a lot of umi
%%MatrixMarket matrix coordinate real general
% written by sprs
194079 4925 13543647

to be fair that had like 25k umi median... which is abnormally large. A good number is around 4k for cells, maybe 1k for nuclei.

changostraw · 2022-06-08T21:15:27Z

I see. Would it help if the range was large? The last time I ran a snRNA-seq test on these samples the range was quite large. Because this is human postmortem tissue the RNA in the mucosal cells is very degraded. However, the RNA in the neurons and glial cells and certain immune cells are usually well preserved, so I was expecting these nuclei to have higher UMIs. Would it help to restrict to cells with a higher UMI?

wheaton5 · 2022-06-08T21:15:32Z

here is a more normal run, with about 4k umi per cell

%%MatrixMarket matrix coordinate real general
% written by sprs
186395 14030 4201477

So there is basically almost no data to go off of in your case.

changostraw · 2022-06-08T21:17:46Z

Okay that makes sense. I also have data from postmortem brain tissue which has ~1600 UMI per cell. So I will run that as well and make sure that is the issue.

wheaton5 · 2022-06-08T21:18:22Z

I don't think the low UMI cells should hurt the clustering. The cells with more UMI will contribute that much more to the clustering results and the low UMI cells should get sort of placed according to how the high UMI cells have shaped the cluster centers. But you may run into errors in the code if there are any cells that have no variants at all. I'm not sure

changostraw · 2022-06-08T21:18:30Z

Because we are working with postmortem nuclei we never get the levels that people working with cells get

wheaton5 · 2022-06-08T21:19:37Z

Sure. I think it should work fine with 1600 UMI. But again there might be a bug if any cells contain no variants. fingers crossed

changostraw · 2022-06-08T21:20:50Z

Okay, so to rerun with --min_alt 4 and --min_ref 4. Should I just run the clustering or the whole thing again with those options?

wheaton5 · 2022-06-08T21:25:56Z

You can rerun just the clustering. The pipeline is set up to rerun from a partial previous run. There are .done files for each step. So just delete clustering.done (i think thats right, maybe souporcell.done) and troublet.done and then rerun but with those new options

changostraw · 2022-06-14T14:17:31Z

Okay so I ran it on my brain data with ~1600 UMI/cell and it worked well, although there were still high levels of unassigned ( 46%). I also re-ran it on the cecum data with the min_alt 4 etc and it doesn't crash but all the droplets are unassigned. I am using raw donor GT right now but I plan to impute the genotype data in the future to fill in some gaps, perhaps this would help? I am also going to try and demultiplex using the 10000genomes genotype ref and then assign any clusters after the fact - perhaps that will give me more variants per nuclei? Otherwise I guess I have to go back to the drawing board for the cecum samples. Thanks!

ZebinWen · 2022-08-02T00:06:10Z

Sorry for my late reply and thanks for your update @Slacanch @wheaton5 . There are 2 barcodes.tsv in cellranger's output files, one is in raw data directory, another is in filtered data directory. So according to your addvices, I change my barcodes file from the one in raw data directory into the one in filtered data directory, and it works! 🤙👍🍻
At the same time, I keep the pipeline running uninterrupted. If it breaks, I can resume the pipeline by rerun the code, and it can continue your previous work thanks to the .done files which is so brilliant!

ZebinWen closed this as completed Aug 3, 2022

CarlottaHS mentioned this issue Jul 21, 2023

Doublet detection. Returned non-zero exit status 101 #175

Open

timslittle mentioned this issue Jan 3, 2024

error debug help #124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clusters_tmp.tsv is empty which caused subprocess.CalledProcessError #128

clusters_tmp.tsv is empty which caused subprocess.CalledProcessError #128

ZebinWen commented Dec 25, 2021 •

edited

Loading

Slacanch commented Jan 7, 2022 •

edited

Loading

wheaton5 commented Jan 7, 2022

Slacanch commented Jan 10, 2022

Slacanch commented Jan 12, 2022

Slacanch commented Jan 13, 2022

wheaton5 commented Jan 14, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

changostraw commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 14, 2022

ZebinWen commented Aug 2, 2022

clusters_tmp.tsv is empty which caused subprocess.CalledProcessError #128

clusters_tmp.tsv is empty which caused subprocess.CalledProcessError #128

Comments

ZebinWen commented Dec 25, 2021 • edited Loading

Slacanch commented Jan 7, 2022 • edited Loading

wheaton5 commented Jan 7, 2022

Slacanch commented Jan 10, 2022

Slacanch commented Jan 12, 2022

Slacanch commented Jan 13, 2022

wheaton5 commented Jan 14, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

changostraw commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 8, 2022

wheaton5 commented Jun 8, 2022

changostraw commented Jun 14, 2022

ZebinWen commented Aug 2, 2022

ZebinWen commented Dec 25, 2021 •

edited

Loading

Slacanch commented Jan 7, 2022 •

edited

Loading