Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTA host file does not work in IGV #74

Closed
hoelzer opened this issue Dec 21, 2023 · 15 comments
Closed

FASTA host file does not work in IGV #74

hoelzer opened this issue Dec 21, 2023 · 15 comments
Assignees

Comments

@hoelzer
Copy link
Member

hoelzer commented Dec 21, 2023

results/intermediate/
|-- host.fa.fai
|-- host.fa.gz
`-- map-to-remove

has these files. But when I gunzip the host.fa.gz I can not load it into IGV.

image

@MarieLataretu
Copy link
Collaborator

MarieLataretu commented Jan 3, 2024

Hi @hoelzer , so you try to load the uncompressed genome (host.fa)?

When I load the gziped file, IGV says [MessageUtils] IGV cannot readed gzipped fasta files.. But after gunzipping it works 🤔

I checked an old file that was created before this commit 5d87ac4

@MarieLataretu
Copy link
Collaborator

So the problem is not the (unzipped) FASTA file, but the compressed index (--gzi-idx here).

Do you think, it's okay to work with an uncompressed index, @hoelzer , @matthuska ?

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

Yes, I unzipped the host.fa file but then was not able to load it into igv. And got the weird output I screenshoted above.

It is working for you, @MarieLataretu ? Then maybe this was some rki/HPC/weird problem... ?

Maybe it's fine to work with an uncompressed index that an user can also directly load into igv or other browsers? +1 from me if that solves that problem. But maybe we should first see if we can reproduce it (e.g. via running on another machine)

@matthuska
Copy link
Collaborator

@hoelzer try renaming the index from .fai to .gzi and leave the other files as they were (do not decompress the host fasta file). Does that work? If not, we can use a non-compressed index, I don't think it's very big but we can check.

@MarieLataretu
Copy link
Collaborator

For me, it worked with an uncompressed index (and the same uncompressed FASTA)!
(And I was able to reproduce it on a Linux laptop.)

I overwrote the index file with samtools faidx host.fa --fai-idx FILE, so changing --gzi-idx to --fai-idx should solve the problem.

Users would still have to uncompress the FASTA (save space) - the index file would be uncompressed and shouldn't be so big 🤔

@MarieLataretu
Copy link
Collaborator

@hoelzer try renaming the index from .fai to .gzi and leave the other files as they were (do not decompress the host fasta file). Does that work? If not, we can use a non-compressed index, I don't think it's very big but we can check.

IGV can not handle compressed reference genomes!

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

I overwrote the index file with samtools faidx host.fa --fai-idx FILE, so changing --gzi-idx to --fai-idx should solve the problem.

Users would still have to uncompress the FASTA (save space) - the index file would be uncompressed and shouldn't be so big 🤔

+1 that sounds like a good solution. Let me see if I can reproduce the problem and then fix it like that.... but I only have my mac m2 here and I would not trust any errors that come from running on that machine ;)

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

Hm... I am running the dev branch

[75/dbea91] process > prepare_contamination:concat_contamination                [100%] 1 of 1, failed: 1 ✘
[-        ] process > clean:minimap2                                            -
[-        ] process > clean:sort_bam                                            -
[-        ] process > clean:index_bam                                           -
[-        ] process > clean:idxstats_from_bam                                   -
[-        ] process > clean:flagstats_from_bam                                  -
[-        ] process > clean:split_bam                                           -
[-        ] process > clean:index_bam2                                          -
[-        ] process > clean:fastq_from_bam                                      -
[-        ] process > qc:nanoplot                                               -
[-        ] process > qc:format_nanoplot_report                                 -
[-        ] process > qc:multiqc                                                -
WARN: Task runtime metrics are not reported when using macOS without a container engine
ERROR ~ Error executing process > 'prepare_contamination:concat_contamination'

Caused by:
  Process `prepare_contamination:concat_contamination` terminated with an error exit status (1)

Command executed:

  # Combine input files, rename duplicate sequences (by id) if found, and compress
  seqkit seq sc2.fa.gz | seqkit rename | bgzip -@ 1 -c > db.fa.gz
  samtools faidx db.fa.gz --gzi-idx db.fa.fai

Command exit status:
  1

Command output:
  (empty)

Command error:
  [faidx] Could not build fai index db.fa.gz.fai or compressed index db.fa.fai

maaaybe an mac m2 problem... I tried with the docker and mamba profile - same error.

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

ok... that's probably bc

zcat host-temp.fa.gz

does not work on mac : ) So the downloaded file is then empty... I will look into that and then the index thing

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

Ok, I fixed that zcat problem here: #76

forget it

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

Then I ran

nextflow run git/clean/clean.nf -profile local,mamba --host sc2 --input_type nano --input git/clean/test/nanopore.fastq.gz --cleanup_work_dir

and tried to open the resulting host FASTA in IGV. To do that, I gunzip db.fa.gz first and left the db.fa.fai untouched.

This time, it worked!

But note that I ran on the master branch... I am not sure what's in dev that might be not in master currently.

Suggestion: can we first figure out #76 and then I can try it again on master (or dev?) and then we can decide if the change @MarieLataretu suggested is really necessary

@MarieLataretu please see my final comment below. I was not sure where I should suggest that change bc I am confused with the different branches : )

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

Ok, so I think the only thing that I need to get it run on mac is a change of this line:

  zcat host-temp.fa.gz | bgzip -@ ${task.cpus} -c > ${host}.fa.gz

to

  zcat < host-temp.fa.gz | bgzip -@ ${task.cpus} -c > ${host}.fa.gz

This needs to be done in dev, right? I canceled my PR bc I did the changes in master which is anyway behind dev afaik. Maybe you can simply add the < at some point so it also runs an Mac OS. Thx!

@matthuska
Copy link
Collaborator

This needs to be done in dev, right?

Yes. I can do that.

@hoelzer
Copy link
Member Author

hoelzer commented Jan 3, 2024

Hey, I reopened the issue because now I ran dev again after the zcat fix for my mac. The pipeline works but again, I can not directly load the host.fa (after gunzip) in IGV.

But when I overwrite the index file via

samtools faidx host.fa --fai-idx host.fa.fai

it works (as suggested by @MarieLataretu ).

so changing --gzi-idx to --fai-idx should solve the problem.

Users would still have to uncompress the FASTA (save space) - the index file would be uncompressed and shouldn't be so big 🤔

I think that's a good solution so +1 from me to change --gzi-idx to --fai-idx. And then the user has to gunzip the FASTA for IGV and it should work.

@matthuska
Copy link
Collaborator

Resolved by @MarieLataretu 's #79 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants