Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError during trycycler cluster #11

Closed
aruginkgo opened this issue Jan 20, 2021 · 2 comments
Closed

FileNotFoundError during trycycler cluster #11

aruginkgo opened this issue Jan 20, 2021 · 2 comments

Comments

@aruginkgo
Copy link

aruginkgo commented Jan 20, 2021

For some reason I get FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpz1qeqdno/A_assemblies/canu_0_pos.fasta' when running trycycler cluster during the distance matrix part. It seems like the temp directory is being made but not the A_assemblies directory inside that.

I think I am using the latest version of Trycycler (that is to say, I python3 setup.py install'd in a directory called Trycycler-0.4.2 but the version.py in that is still 0.4.1)

I was able to Trycycle a different set of assemblies so it might be something on my end. I can't share the sequences unfortunately but I can try to see if I can get a reproducible example going.

Building distance matrix (2021-01-20 19:08:34)
    Mash is used to build a distance matrix of all contigs in the assemblies.

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/trycycler", line 11, in <module>
    load_entry_point('Trycycler==0.4.1', 'console_scripts', 'trycycler')()
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/__main__.py", line 40, in main
    cluster(args)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 41, in cluster
    matrix = distance_matrix(seqs, seq_names, args.distance)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 232, in distance_matrix
    mash_matrix = get_mash_dist_matrix(seq_names, seqs, distance, indent=False)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/mash.py", line 28, in get_mash_dist_matrix
    pos_sketches, neg_sketches = make_mash_sketches(seq_names, seqs, temp_dir)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/mash.py", line 63, in make_mash_sketches
    write_seq_to_fasta(seq_pos, seq_name, fasta_pos)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/misc.py", line 155, in write_seq_to_fasta
    with open(filename, 'wt') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp6f7zakqj/A_assemblies/canu_0_pos.fasta'
@aruginkgo
Copy link
Author

aruginkgo commented Jan 20, 2021

for what it's worth, I threw in

from pathlib import Path
os.makedirs(temp_dir / Path(seq_name).parent, exist_ok=True)

in make_mash_sketches just after fasta_pos and fasta_neg and it finished the distance matrix part then crashed again at clustering with a similar issue:

cluster/cluster_001/1_contigs:
Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/trycycler", line 11, in <module>
    load_entry_point('Trycycler==0.4.1', 'console_scripts', 'trycycler')()
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/__main__.py", line 40, in main
    cluster(args)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 42, in cluster
    cluster_numbers = complete_linkage(seqs, seq_names, depths, matrix, args.distance, args.out_dir)
  File "/home/ubuntu/.local/lib/python3.7/site-packages/Trycycler-0.4.1-py3.7.egg/trycycler/cluster.py", line 325, in complete_linkage
    with open(seq_fasta, 'wt') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'cluster/cluster_001/1_contigs/A_assemblies/canu_0.fasta'

with 1_contigs being created but empty

edit: with the hacky "fix" (big air quotes)

os.makedirs(cluster_dir / pathlib.Path(name).parent, exist_ok=True)
seq_fasta = cluster_dir / f'{pathlib.Path(name).stem}.fasta'

in cluster.py line ~324 in the loop not crashing and creating the final cluster_001/1_contigs/*_0.fasta but not sure why it's looking for the A_assemblies directory to begin with.

trycycler reconcile worked after that as well.

@rrwick
Copy link
Owner

rrwick commented Feb 23, 2021

Thanks for spotting this bug! If I understand correctly, one of your input assemblies has a contig named assemblies/canu_0. The slash is causing the problem, because the Trycycler cluster command saves contigs to a temporary file using their contig name as a filename. So it was trying to save /tmp/tmp6f7zakqj/A_assemblies/canu_0_pos.fasta, but the /tmp/tmp6f7zakqj/A_assemblies/ directory didn't exist because it was trying to save a file named A_assemblies/canu_0_pos.fasta.

I've taken the easy way out of this one and just made Trycycler check for slashes in contig names and quit with an error if they are there. That was easier than ensuring slash-containing contig names don't cause a crash 😄

Also, thanks for pointing out the version number discrepancy! I've made a new version with the fix (v0.4.3), and now both GitHub and the code agree.

@rrwick rrwick closed this as completed Feb 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants