Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STARsolo: support for multiple (3) barcode locations #838

Open
ghuls opened this issue Feb 14, 2020 · 12 comments
Open

STARsolo: support for multiple (3) barcode locations #838

ghuls opened this issue Feb 14, 2020 · 12 comments
Labels

Comments

@ghuls
Copy link

ghuls commented Feb 14, 2020

Hi Alex,

We have a custom inhouse design which requires 3 separate barcode locations (10bp each) with a different whilelist for each barcode separated by 2 adapters.

Read 2:

[BC1]-CAGCTACTGC-[BC2]-CGAGTACCCT-[BC3]-[UMI]

with:

  • BC1, BC2, BC3 = 10bp, each with a different white list
  • UMI = 8 bp

Any chance to get support for something like this in STARsolo?

@ghuls ghuls changed the title STARsolo multiple barcodes STARsolo: support for multiple (3) barcode locations Feb 14, 2020
@alexdobin
Copy link
Owner

Hi Gert,

sorry for the belayed reply.
This is already supported with --soloType CB_UMI_complex
For your geometry, the CB/UMI geometry parameters should be, I think:
--soloCBposition 0_0_0_10 0_21_0_30 0_41_0_50
--soloUMIposition 0_51_0_58
And you would need to provide 3 whitelist files:
--soloCBwhitelist wl1 wl2 wl3

Please let me know if you have any issues with these paraemeters. If they do not work, I could look at a few thousand reads to tweak them.

Cheers
Alex

@fderop
Copy link

fderop commented Feb 28, 2020

Hi Alex,

We have tested this and it does indeed work. We used the settings --soloCBposition 0_0_0_9 0_20_0_29 0_40_0_49 however.

Florian

@ghuls
Copy link
Author

ghuls commented Feb 28, 2020

@alexdobin

Would it be possible to write the corrected cell barcode to the SAM attributes too?

We use the following settings:

    STAR \
        --runThreadN 8 \
        --runMode alignReads \
        --outSAMtype BAM SortedByCoordinate \
        --sysShell /bin/bash \
        --genomeDir "${star_reference_dir}" \
        --readFilesIn "${fastq_R1_filename}" "${fastq_R2_filename}" \
        --readFilesCommand 'gzip -c -d' \
        --soloCBwhitelist "${whitelist_part1_filename}" "${whitelist_part2_filename}" "${whitelist_part3_filename}" \
        --soloType CB_UMI_Complex \
        --soloCBposition 0_0_0_9 0_20_0_29 0_40_0_49 \
        --soloUMIlen 2 \
        --soloUMIposition 0_50_0_51 \
        --sjdbGTFfile ${gft_filename} \
        --soloCellFilter None \
        --soloCBmatchWLtype 1MM \
        --outSAMattributes CB UB CR CY UR UY \
        --outFileNamePrefix "${bam_filename%bam}"

@alexdobin
Copy link
Owner

Hi Florian,

you are right, the positions are 0-based, had to check my code to make sure. :(
And you used --soloUMIposition 0_50_0_57, right?
I will make changes in the Manual to clarify it.

Thanks!
Alex

@fderop
Copy link

fderop commented Mar 2, 2020

Hello Alex,

Our library does not have a UMI, so we had to input a dummy UMI setting to make STARsolo work. I believe we used --soloUMIposition 0_0_0_1, where these two bases are not really random. I was getting core dumps if I did not enter a UMI position.

Florian

@alexdobin
Copy link
Owner

Hi Gert,

the CB tags do not work presently for the "complex" CBU_UMI barcodes...
Would you want to output the concatenated 30b sequence? I can implement it, though it will not be quick. In the meantime, it might be easiest to preprocess the CB/UMI read into 30b CB + dummy UMI sequence and use it as a simple CB_UMI barcode (with whitelist being the Cartesian product of the 3 whitelists), which would allow the output of CB in the BAM file.

Cheers
Alex

@alexdobin
Copy link
Owner

alexdobin commented Mar 2, 2020

Hi Florian

on the 2nd thought, I do not think replacing UMI with a constant 2b sequence is going to work, as all of the will "collapse" into 1 read, so you will have no more than 1 read per cell. I will need to implement an option to count all reads without collapsing UMIs.

Cheers
Alex

@fderop
Copy link

fderop commented Mar 2, 2020

Hello Alex,

We have also previously considered pre-processing the barcode read and to use simple CB/UMI. Our current design uses 96x96x96 barcode possibilities (some 880k unique barcodes) and a simple CB/UMI approach would work well. However, we might consider scaling up to 384x384x384 in the future. If I understand correctly, working with a barcode whitelist that spans 56m possibilities could be computationally challenging, which is why CB/UMI complex is so attractive.

We are currently not concerned about the collapsing of reads in the expression matrix since we are mostly interested in the .bam file, but the option to run STARsolo without UMI might be helpful to demultiplex single cell sequencing libraries not stemming from scRNA-seq experiments in the future.

Florian

@alexdobin
Copy link
Owner

Hi Florian,

actually, the 56m barcode list should not create serious problems, so I would try creating the 30b cell barcode, as it's generally easier to handle.
Then, if you are interested in the CB tag only, you can use --soloType CB_samTagOut option (together with --soloCBmatchWLtype 1MM), which will skip the UMI counting.

Cheers
Alex

@ghuls
Copy link
Author

ghuls commented Apr 17, 2020

@alexdobin In the past a 30 bp barcode definetely would not work with STAR as the longest CB supported by STAR is/was 16 bp (due to the use of a 32 bit integer). So barcodes could collapse (in the past all nucleotides except the last 16 would be A when written out when being decoded from a 32 bit integer).

Does the code use a 64 bit integer already for the CB?

See an old pull request: #588

@fderop
Copy link

fderop commented Apr 20, 2020

I can confirm that a 30 bp cell barcode works with CB_UMI_Simple.

@alexdobin
Copy link
Owner

Hi Gert,

I pulled in your request in 2.7.1a, it should work now.
Thanks for the confirmation, Florian!

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants