Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when using --soloFeatures Velocyto #1602

Open
GeorgetteTanner opened this issue Jul 7, 2022 · 9 comments
Open

Segmentation fault when using --soloFeatures Velocyto #1602

GeorgetteTanner opened this issue Jul 7, 2022 · 9 comments
Labels
issue: code Likely to be an issue with STAR code

Comments

@GeorgetteTanner
Copy link

Hi Alex

Thanks for the great program.

I've been getting a segmentation fault error which happens with both the newest patch (2.7.10a_alpha_220601) as well as previous patches for 2.7.10a_alpha and version 2.7.9a. I've narrowed the problem down to when I include --soloFeatures Velocyto as it works fine when I use "--soloFeatures Gene GeneFull SJ". Just noticed this was also mentioned as a side issue in another thread: #1366

As a work around am I right that if I subtract Gene counts from GeneFull counts I would effectively end up with un-spliced counts, and with Gene counts representing spliced+ambiguous counts? If so this may be the best approach anyway as the He et al. 2022 Alevin-fry paper (https://www.nature.com/articles/s41592-022-01408-3) shows that RNA velocity may be improved by combining ambiguous with spliced reads. Or are ambiguous reads not counted in Gene counts?

Thanks
Georgette

@alexdobin alexdobin added the issue: code Likely to be an issue with STAR code label Jul 8, 2022
@alexdobin
Copy link
Owner

Hi Georgette,

could you please send me the Log.out file from your run?
I agree that the GeneFull-Gene counts could be a reasonable approximation for unspliced counts.

@GeorgetteTanner
Copy link
Author

Hi Alex

This is the log file for the failed run: Log.out.txt

Thanks
Georgette

@ms-gx
Copy link

ms-gx commented Jul 26, 2022

I also get a segfault when setting Velocyto.

Command:
$star_path --runThreadN 63 --genomeDir $ref_annotation_path --readFilesIn "$workdir_path"FILTERED_"$id"_Aligned.sortedByCoord.out.bam --readFilesType SAM SE --readFilesCommand samtools view -F 0x100 --soloInputSAMattrBarcodeSeq CR UR --soloInputSAMattrBarcodeQual CY UY --soloType CB_UMI_Simple --outFileNamePrefix "$workdir_path"MAPPING2_"$id"_ --soloUMIlen 12 --soloCBwhitelist <(zcat cellranger/barcodes/3M-february-2018.txt.gz) --outSAMattributes CB UB cN --outSAMtype BAM SortedByCoordinate --soloCBmatchWLtype 1MM multi Nbase pseudocounts --soloUMIfiltering MultiGeneUMI_CR --soloUMIdedup 1MM_CR --clipAdapterType CellRanger4 --outFilterScoreMin 30 --soloCellFilter EmptyDrops_CR --limitBAMsortRAM $sort_memory --outSAMunmapped Within --soloFeatures Gene GeneFull SJ Velocyto > "$workdir_path"MAPPING2_"$id".mainlog.txt

EDIT:
Same as said above: as soon as I remove Velocyto (so only Gene GeneFull SJ) there is no segfault.

@alexdobin
Copy link
Owner

Hi Georgette, Michael,

I was trying to reproduce this seg-faults with both sets of parameters, but it did not happen on m test sets.
Could you please check that it happens on a smaller subset of reads (<100k) and send me such subset?

@alexdobin
Copy link
Owner

I have found another potential problem that may be causing a seg-fault and fixed it:
https://github.com/alexdobin/STAR/releases/tag/2.7.10a_alpha_220818
If you could test this patch, it would be great!

@ms-gx
Copy link

ms-gx commented Aug 19, 2022

Dear Alex

I tested 2.7.10a_alpha_220818 on the dataset which used to trigger the segfault... and voilà it works now and there is no segfault.

Also, I tested again with the exact same conditions (except STAR version obviously) on 2.7.10a with the same dataset and there I could reproduce the segfault again.

So the problem is gone for me. Thanks much Alex!

@alexdobin
Copy link
Owner

Hi Michael,
thanks a lot for testing it!

@johnchamberlin
Copy link

Hello, I am also having a velocyto-induced segfault issue with STAR version 2.7.9a. This is with paired-end alignment and option '--peOverlapNbasesMin 5'

This is with 3' 10x genomics 150x150bp sequencing which I assume is not a normal use case. Do you know if it works with 5' assay data? See also #1366.

Thanks.

STAR --genomeDir GRCh38_ensg104.filtered --peOverlapNbasesMin 5 --soloType CB_UMI_Simple --soloBarcodeMate 2 --clip5pNbases 0 60 --soloCBwhi telist 3M-february-2018.txt --soloCBstart 1 --soloCBlen 16 --soloUMIstart 17 --soloUMIlen 12 --soloB arcodeReadLength 150 --readFilesIn R2.fastq.gz R1.fastq.gz --soloFeatures Gene GeneFull SJ Velocyto--soloUMIfiltering MultiGeneUMI --soloCBmatchWLtype 1MM_multi_Nbase _pseudocounts --outFileNamePrefix AG1. --outSAMtype BAM SortedByCoordinate --ou tSAMattributes NH HI nM AS CR UR CB UB GX GN sS sQ sM --runThreadN 16 --readFilesCommand zcat

@alexdobin
Copy link
Owner

Hi John,

Velocyto calculation may not work with 5' protocol, and --peOverlapNbasesMin may not work properly with any solo options.
If you need to merge the reads, I would recommend doing it with another tool before mapping, but keeping the merged cDNA sequence and barcode sequence as separate reads - this way you will be able to use solo 3' options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue: code Likely to be an issue with STAR code
Projects
None yet
Development

No branches or pull requests

4 participants