Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function of STAR parameter outSAMtype BAM Unsorted in Arriba run? Which GTF is suitable? How to remove gene fusion from the benign samples? #257

Open
tanayb001 opened this issue Oct 26, 2024 · 2 comments

Comments

@tanayb001
Copy link

Hi,
Thank you for developing such an amazing tool to identify gene fusion from RNASeq data. I have 30 cancer sample Toral RNASeq data and 10 Benign sample Total RNASeq data. My focus is to identify DEGs and gene fusion form these samples.
I have a few questions:

Q1: In the documentation it is mentioned that in the Arriba direct run, STAR will run using the following two parameters:
--outStd BAM_Unsorted --outSAMtype BAM Unsorted
My question is what will happen if I use --outSAMtype BAM SortedByCoordinate and --quantMode TranscriptomeSAM parameter while STAR run separately.

And if run STAR with --outStd BAM_Unsorted --outSAMtype BAM Unsorted these parameters, where there will be any issues in my downstream DEGs analysis?

Q2: I am using UCSC hg38 genome to align my data. In that case which GTF file I should? Will it be refGene GTF (UCSC) or GENCODE GTF? Since Arriba documentation is mentioning GENCODE GTF file?

Q3: What will be my pipeline if I want to identify fusion only in tumor specific condition (e.g. benign samples will be used as control to remove some of the common fusion)?

Thank you.

Regards,
Tanay

@tanayb001 tanayb001 changed the title Function of STAR parameter outSAMtype BAM Unsorted in Aribba run? Which GTF is suitable? How to remove gene fusion from the benign samples? Function of STAR parameter outSAMtype BAM Unsorted in Araiba run? Which GTF is suitable? How to remove gene fusion from the benign samples? Oct 26, 2024
@tanayb001 tanayb001 changed the title Function of STAR parameter outSAMtype BAM Unsorted in Araiba run? Which GTF is suitable? How to remove gene fusion from the benign samples? Function of STAR parameter outSAMtype BAM Unsorted in Arriba run? Which GTF is suitable? How to remove gene fusion from the benign samples? Oct 26, 2024
@suhrig
Copy link
Owner

suhrig commented Oct 26, 2024

Q1: It makes no difference whether you use --outStd BAM_Unsorted --outSAMtype BAM Unsorted or --outSAMtype BAM SortedByCoordinate. Arriba doesn't care if the alignments are sorted or not. The documentation doesn't sort the alignments, because it's not needed for fusion calling and would thus be a waste of CPU. (To be precise, there may be very minor differences in the fusion calls when you use sorting, but they are not any more or less meaningful than when not sorting. So do whatever is best in your workflow.)

When you use TranscriptomeSAM, make sure you pass the regular BAM file (containing genomic coordinates) to Arriba. Do not pass the file *.toTranscriptome.out.bam to Arriba! Arriba needs genomic coordinates.

Q2: Feel free to use a UCSC GTF file if this is what you use usually. GENCODE works a bit better for fusion calling in my experience, because it has more detailed annotation. But if you normally use UCSC, then it would be complicated to compare Arriba's output based on GENCODE to your other results based on UCSC.

There is one very important exception: If your cancer samples are from hematologic malignancies, then you should use GENCODE, because USCS does not annotate the T-cell receptor loci. Arriba can only call fusions for annotated genes.

Q3: Run Arriba separately on both normal and malignant samples. Then, take all fusions found in the main output file (-o fusions.tsv) and in the discarded output file (-O discarded_fusions.tsv) of the control samples. Subtract these fusions from the malignant samples. The subtraction should be done using breakpoint coordinates (as listed in the columns breakpoint1/2). Do not subtract fusions by gene name.

Let me know if anything is unclear.

@tanayb001
Copy link
Author

Thank you for the response. My samples are not from any hematologic malignancies. Yeah, then I should use refGene file or if I really want to use GENCODE GTF then I should use Ensembl genome FASTA file instead of UCSC hg38 FASTA.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants