Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry option with costum MAG collection #51

Open
RolandWirth opened this issue Jan 8, 2025 · 14 comments
Open

Cherry option with costum MAG collection #51

RolandWirth opened this issue Jan 8, 2025 · 14 comments

Comments

@RolandWirth
Copy link

RolandWirth commented Jan 8, 2025

Hi KennthShang!

I wanted to ask about the --task cherry option. I would like to predict the hosts of my detected virus sequences using my own MAG collection. I used the following command for that:
phabox2 --task cherry --dbdir /path/to/databases/PhaBOX2_db/phabox_db_v2/ --outpth /path/to/Out_Cherry/ --contigs /path/to/Concatenated_phage_contigs.fasta --bfolder /path/to/MAGs/ --magonly Y --threads 40

However, I got the following error message:

PhaBOX2 is running with: 40 threads!
Running program: CHERRY (Host prediction)
[1/4] reusing existing filtered contigs...
[2/4] finding CRISPRs from MAGs...
Traceback (most recent call last):
  File "/home/use/.conda/envs/phabox2/bin/phabox2", line 10, in <module>
    sys.exit(main())
  File "/home/use/.conda/envs/phabox2/lib/python3.10/site-packages/phabox2/phabox2.py", line 418, in main
    cherry.run(inputs)
  File "/home/use/.conda/envs/phabox2/lib/python3.10/site-packages/phabox2/cherry.py", line 133, in run
    result = future.result()
  File "/home/use/.conda/envs/phabox2/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/home/use/.conda/envs/phabox2/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/home/use/.conda/envs/phabox2/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/use/.conda/envs/phabox2/lib/python3.10/site-packages/phabox2/scripts/ulity.py", line 1008, in run_crt
    subprocess.run(cmd, check=True, capture_output=True, text=True)
  File "/home/use/.conda/envs/phabox2/lib/python3.10/subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/home/use/.conda/envs/phabox2/lib/python3.10/subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/use/.conda/envs/phabox2/lib/python3.10/subprocess.py", line 1863, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'java'

The end_to_end options did the same if I used the --bfolder and --magonly Y options. Although if I used the given database for host prediction, the workflow worked properly.

Configuration:
phyton=3.10
PhaBOX=2.1.10 (latest)

Best,
Roland

@KennthShang
Copy link
Owner

KennthShang commented Jan 8, 2025

Hi there,

It seems the program cannot reach your --bfolder in your provided error

FileNotFoundError: [Errno 2] No such file or directory: 'java'

Maybe you can check the path or provide more detailed information so that I can help.

Best,
Jiayu

@RolandWirth
Copy link
Author

RolandWirth commented Jan 8, 2025

I checked the path, but it seems fine. I used the absolute path of the folder. The files inside the /MAGs/ folder can be listed with ls.
Here is what I see:
ls /path/to/MAGs/

metabat2_1_102_sub.fasta  metabat2_3_135.fasta      metabat2_4_378.fasta      metabat2_6_20_sub.fasta   semibin2_1_361_sub.fasta  semibin2_5_149_sub.fasta   vamb_2_1498.fasta
metabat2_1_105.fasta      metabat2_3_136_sub.fasta  metabat2_4_381_sub.fasta  metabat2_6_210_sub.fasta  semibin2_1_379_sub.fasta  semibin2_5_1555.fasta      vamb_2_1689.fasta

...
The .fasta files containing the sequences like this:
head /path/to/metabat2_1_102_sub.fasta

>c_000000022452
ATACCGTCATTGATGCGCTCCTTTCGACGGTAAAGAAAGATTATACTTTTGAAGATGAAGAAATACAGGCGATTGAAGAA
CACAAGCGC..."
>c_000000022307
GTATCAGCAATGATATTGCCATGCAGGTCGTCGGCGTAGGCTTCATCATTAGTGTTCTTGAAGATGCCTGTGAAACGGCA
CTGAACAGTTCC...
...

Best,
R.

@KennthShang
Copy link
Owner

KennthShang commented Jan 8, 2025

Ah, I think I found the problem. It seems your system does not have "Java" installed.

It is a basic module/environment that is installed by default on almost all systems. So I did not list it in the guideline.

You can find a proper package here that can be installed on your system.

Best,
Jiayu

@RolandWirth
Copy link
Author

RolandWirth commented Jan 8, 2025

Thank you, KennthShang for the fast feedback :)

Indeed, my conda environment did not have "Java" installed.
I have installed it with the following command inside the Pahbox2 environment:
mamba install -c conda-forge openjdk=23
Now the program can find my MAG collection, but I got another error:

PhaBOX2 is running with: 40 threads!
Running program: CHERRY (Host prediction)
[1/4] filtering the length of contigs...
[2/4] finding CRISPRs from MAGs...
[3/4] predicting MAG CRISPRs...
Command 'blastn -task blastn -query /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//filtered_contigs.fa -db /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_db/magCRISPRs -out /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_out.tab -outfmt "6 qseqid sseqid evalue pident length slen" -evalue 1 -max_target_seqs 25 -perc_identity 90 -num_threads 40' failed with exit code 3
BLAST Database error: Database memory map file error

Best,
R.

@KennthShang
Copy link
Owner

Ummmm, this should be an issue caused by the BLAST+ and I am also the first time to see this.

Maybe you can show a screenshot of your ...ARB_Cherry/midfolder/ and ...ARB_Cherry/midfolder/crispr_db?

let's check whether the CRISPRs.fa and the NCBI database files are generated correctly first.

Best,
Jiayu

@RolandWirth
Copy link
Author

RolandWirth commented Jan 8, 2025

The ...ARB_Cherry/midfolder/ contain the followings:

crispr_db
crispr_fa
crispr_tmp
crispr_out.tab (0 KB)
CRISPRs.fa (294 KB)

The CRISPR.fa containing the sequences:
head /path/to/CRISPR.fa

>semibin2_2_13_sub_CRISPR_0
CAACTGGGATTGACTTAATTGAAAAGGCTATAG
>semibin2_2_13_sub_CRISPR_1
GAGCCTTACTGGTGTTACCTTACTGGTGTTGC
>semibin2_2_13_sub_CRISPR_2
CCTATCCTATTTAAGATTGGATACTCATCTGT
>semibin2_2_13_sub_CRISPR_3
TAGTAGTTTTTATGTCTGCGTGGCCCAACAGC
...

The ...ARB_Cherry/midfolder/crispr_db contains:

magCRISPR.ndb (412 KB)
magCRISPR.nhr (304 KB)
magCRISPR.nin (58 KB)
magCRISPR.njs (1 KB)
magCRISPR.nog (20 KB)
magCRISPR.nos (161 KB)
magCRISPR.not (58 KB)
magCRISPR.nsq (44 KB)
magCRISPR.ntf (16 KB)
magCRISPR.nto (20 KB)

So, it seems the CRISPRs.fa and the NCBI database files are generated correctly.

@KennthShang
Copy link
Owner

Seems so,

Then you can try to run the blast command directly and see what would happen:

blastn -task blastn -query /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//filtered_contigs.fa -db /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_db/magCRISPRs -out /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_out.tab -outfmt "6 qseqid sseqid evalue pident length slen" -evalue 1 -max_target_seqs 25 -perc_identity 90 -num_threads 40

@RolandWirth
Copy link
Author

I used the given command and got the same error message as previously:
BLAST Database error: Database memory map file error

@KennthShang
Copy link
Owner

Then, let's try to rebuild the blast database and see whether it can work

 makeblastdb -in /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder/CRISPRs.fa -dbtype nucl -parse_seqids -out /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_db/magCRISPRs

If it still does not work, then maybe try:

 makeblastdb -in /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder/CRISPRs.fa -dbtype nucl -out /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_db/magCRISPRs

And if still (I hope you do need to use this), maybe you need to reinstall the blast+. the version should be blast=2.16.0 as given in the WIKI

@RolandWirth
Copy link
Author

RolandWirth commented Jan 8, 2025

Thank you, KennthShang!

It worked!
I tried to rebuild the blast database with:

makeblastdb -in /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder/CRISPRs.fa -dbtype nucl -parse_seqids -out /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_db/magCRISPRs

After, run the blast command directly:

blastn -task blastn -query /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//filtered_contigs.fa -db /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_db/magCRISPRs -out /srv/bi03/wirth/Per_sample_WWTP_AMR_study/Prokaryota_phage_ARG_analysis/Phage_vMAGs_summary/ARB_Cherry//midfolder//crispr_out.tab -outfmt "6 qseqid sseqid evalue pident length slen" -evalue 1 -max_target_seqs 25 -perc_identity 90 -num_threads 40

Now the crispr_out.tab contains the results! :)

@KennthShang
Copy link
Owner

Good to know!

But it still seems strange why the program failed since the blastn can be run separately.

Hope PhaBOX will not crush again on your system.

Best,
Jiayu

@RolandWirth
Copy link
Author

Indeed, it is a mystery :)
I checked the PhaBOX integrity. PhaBOX worked perfectly if I did not want to use my own MAG collection.

Best,
Roland

@KennthShang
Copy link
Owner

Ummmm, I mean when I use the onlymag mode, it works well on my systems and passes all the tests. So, not sure why.

@RolandWirth
Copy link
Author

RolandWirth commented Jan 20, 2025

The above solution can raise the question of what is the next step to get the final results. I think the answer to that is complicated, and therefore it is necessary to look for other solutions. One solution can be to reinstall the whole environment manually. The developers give "Installing Phabox2 in primitive ways," which solves the problem with the Cherry analysis on custom MAG collection for me. Besides, the "java" is needed to install inside the environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants