-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using own fa file and gtf file error: gzip: stdin: not in gzip format #250
Comments
What command did you use to compress the file? You must use
Your FastA file appears to start with a line break. Remove the line break and make sure that every line containing a chromosome name begins with a
|
sorry for missing copy the >, I think I alreday have such format, but it still show the same error : EXITING because of INPUT ERROR: the file format of the genomeFastaFile: my_genomeviral.fa is not fasta: the first character is ' Sep 05 00:16:05 ...... FATAL ERROR, exiting
|
I just find the > can not be copies |
Your FastA file starts with a line break. You must remove the line break.
|
yes. I have use this before but it still show the same error head Homo_sapiens.GRCh38.dna.primary_assembly.fa
|
Hm, it probably has something to do with how you added the file to the |
ASSEMBLIES[my_genome]="
https://m-ee4cea.a1bfb5.bd7c.data.globus.org/scratch/chadbren_root/chadbren99/ppxinyi/referenceGRC38/
Homo_sapiens.GRCh38.dna.primary_assembly.fa
<https://m-d953b2.9601a.bd7c.data.globus.org/umms-chadbren-dataden/apurvadb/10236-CB/Homo_sapiens.GRCh38.dna.primary_assembly.fa>
ANNOTATIONS[my_annotation]="
https://m-ee4cea.a1bfb5.bd7c.data.globus.org/scratch/chadbren_root/chadbren99/ppxinyi/referenceGRC38/Homo_sapiens.GRCh38.105.transcript.gtf
COMBINATIONS["my_genome+my_annotation"]="my_genome+my_annotation"
I just added these links in the download_references.sh.
I also find that if I use the ASSEMBLIES file you set and use my own
ANNOTATIONS file, it will show such error: Sep 05 01:30:42 ...... FATAL
ERROR, exiting
.log file:
Downloading assembly:
http://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz
Appending RefSeq viral genomes
Downloading annotation:
https://m-ee4cea.a1bfb5.bd7c.data.globus.org/scratch/chadbren_root/chadbren99/ppxinyi/referenceGRC38/Homo_sapiens.GRCh38.105.transcript.gtf
STAR --runMode genomeGenerate --genomeDir
STAR_index_my_genomeviral_my_annotation --genomeFastaFiles
my_genomeviral.fa --sjdbGTFfile my_annotation.gtf --runThreadN 8
--sjdbOverhang 250
STAR version: 2.7.11b compiled: 2024-08-28T19:55:43-04:00
gl-login2.arc-ts.umich.edu:
/gpfs/accounts/chadbren_root/chadbren99/ppxinyi/STAR-2.7.11b/source
Sep 05 01:29:52 ..... started STAR run
Sep 05 01:29:52 ... starting to generate Genome files
Sep 05 01:30:42 ..... processing annotations GTF
…On Thu, Sep 5, 2024 at 2:23 AM suhrig ***@***.***> wrote:
Hm, it probably has something to do with how you added the file to the
download_references.sh script. Can you attach your modified script?
—
Reply to this email directly, view it on GitHub
<#250 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/BCFUQI2MABJ472CTQD4AAZLZU72FZAVCNFSM6AAAAABNVSZ3DCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZQGY4TMNRRGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Sorry for the show response. I am currently busy. I will have more time next week. In the meantime, have you tried building the STAR index yourself? What the script does is actually not so complicated. All it does is concatenate the RefSeq viruses to the human genome and then builds a STAR index from it. Maybe it's the easiest of you build the index manually? |
When I zip the Homo_sapiens.GRCh38.dna.primary_assembly.fa to .gz file and add this link in the download.sh, it will show the error: gzip: stdin: not in gzip format.
If I use .fa file directly, it will have another error:
EXITING because of INPUT ERROR: the file format of the genomeFastaFile: my_genomeviral.fa is not fasta: the first character is '
' (10), not '>'.
Solution: check formatting of the fasta file. Make sure the file is uncompressed (unzipped).
I have check the first character:
head Homo_sapiens.GRCh38.dna.primary_assembly.fa
Do you have some suggestion for this problem when I want to use the diy fa file and gtf file?
The text was updated successfully, but these errors were encountered: