Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lskatz patch 1 #357

Merged
merged 27 commits into from
Jun 21, 2022
Merged

Lskatz patch 1 #357

merged 27 commits into from
Jun 21, 2022

Conversation

lskatz
Copy link
Contributor

@lskatz lskatz commented Apr 27, 2022

  • This comment contains a description of what is in the pull request.

EToKi is the engine under EnteroBase including the MLST caller.
I have not included the kraken database but have included all other recommended software from their installation documentation.

  • Build your own docker image using a Dockerfile
    • Directory structure should be name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • Includes the recommended LABELS
  • (Optional) Dockerfile is built with best practices and has been approved by a linter (such as https://hadolint.github.io/hadolint/)
  • Edit main README.md
  • Edit Program_Licenses.md
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
  • Write a GitHub actions workflow
    • Should be located in .github/workflows/ and named test-.yml (i.e. .github/workflows/test-spades.yml)
    • Any files required for building are located in the same directory as the Dockerfile (i.e. spades/3.12.0/my_spades_tests.sh)
    • Have successfully run the workflow "Test image" in your forked repository
  • Build your own docker image using a Dockerfile
    • Directory structure should be name of the tool in lower case with special characters removed with a subdirectory of the version number (i.e. spades/3.12.0/Dockerfile)
    • Includes the recommended LABELS
  • (Optional) Dockerfile is built with best practices and has been approved by a linter (such as https://hadolint.github.io/hadolint/)
  • Edit main README.md
  • Ensure tool is listed in Program_Licenses.md
  • Create a simple container-specific README.md in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
  • Update GitHub actions workflow if needed
  • Any files required for building are located in the same directory as the Dockerfile (i.e. spades/3.12.0/my_spades_tests.sh)
  • Have successfully run the workflow "Test image" in your forked repository
  • Build your own docker image using a Dockerfile
    • Includes the recommended LABELS
  • (Optional) Dockerfile is built with best practices and has been approved by a linter (such as https://hadolint.github.io/hadolint/)
  • Ensure tool is listed in Program_Licenses.md
  • Ensure a simple container-specific README.md exists in the same directory as the Dockerfile (i.e. spades/3.12.0/README.md)
  • Update GitHub actions workflow if needed
  • Any files required for building are located in the same directory as the Dockerfile (i.e. spades/3.12.0/my_spades_tests.sh)
  • Have successfully run the workflow "Test image" in your forked repository
  • Update relevant GitHub actions workflow files
  • Any files required for building are located in the same directory as the Dockerfile (i.e. spades/3.12.0/my_spades_tests.sh)
  • Have successfully run the workflow "Test image" in your forked repository

etoki/1.2/Dockerfile Outdated Show resolved Hide resolved
@lskatz
Copy link
Contributor Author

lskatz commented Apr 27, 2022

I mostly passed the linter except that I used cd instead of WORKDIR and have USER root and didn't delete apt lists.

@lskatz
Copy link
Contributor Author

lskatz commented Apr 27, 2022

Darnit, hang on, I need to edit it so that it installs usearch

@lskatz lskatz marked this pull request as draft April 27, 2022 19:31
@lskatz lskatz marked this pull request as ready for review May 3, 2022 20:30
@erinyoung
Copy link
Contributor

I think you're almost out of the woods

I'm impressed that github actions only takes 10 minutes to build and configure this.

I think it would be nice (but not necessary) to replace all the which statements with their actual path when possible.

Also, usearch doesn't seem to be found. Do you think this is an issue?

I copied this from a github action report.

#87 [app 47/48] RUN EToKi.py configure --usearch /usr/local/bin/usearch
#87 0.461 2022-05-03 20:24:05.416984	bbduk ("/opt/bbmap/bbduk.sh") is present. 
#87 0.464 2022-05-03 20:24:05.420896	bbmerge ("/opt/bbmap/bbmerge.sh") is present. 
#87 0.471 2022-05-03 20:24:05.427186	blastn ("/ncbi-blast-2.9.0+/bin/blastn") is present. 
#87 0.509 2022-05-03 20:24:05.465070	bowtie2 ("/opt/bowtie2/bowtie2") is present. 
#87 0.552 2022-05-03 20:24:05.508614	bowtie2build ("/opt/bowtie2/bowtie2-build") is present. 
#87 0.555 2022-05-03 20:24:05.511634	bwa ("/EToKi-1.2/externals/bwa") is present. 
#87 0.559 2022-05-03 20:24:05.515544	diamond ("/usr/local/bin/diamond") is present. 
#87 0.561 2022-05-03 20:24:05.517007	fasttree ("/EToKi-1.2/externals/FastTreeMP-DB") is present. 
#87 0.620 2022-05-03 20:24:05.576285	flye ("/flye/bin/flye") is present. 
#87 1.056 2022-05-03 20:24:06.012683	gatk ("/gatk/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef/GenomeAnalysisTK.jar") is present. 
#87 1.077 2022-05-03 20:24:06.033661	kraken2 ("/kraken2/kraken2") is present. 
#87 1.078 2022-05-03 20:24:06.034177	ERROR - lastal ("/EToKi-1.2/externals/lastal") is not present. 
#87 1.078 2022-05-03 20:24:06.034543	ERROR - lastdb ("/EToKi-1.2/externals/lastdb") is not present. 
#87 1.084 2022-05-03 20:24:06.039964	makeblastdb ("/ncbi-blast-2.9.0+/bin/makeblastdb") is present. 
#87 1.139 2022-05-03 20:24:06.095422	megahit ("/megahit/megahit_v1.1.4_LINUX_CPUONLY_x86_64-bin/megahit") is present. 
#87 1.141 2022-05-03 20:24:06.097058	minimap2 ("/EToKi-1.2/externals/minimap2") is present. 
#87 1.142 2022-05-03 20:24:06.098522	mmseqs ("/mmseqs/bin/mmseqs") is present. 
#87 1.143 2022-05-03 20:24:06.099640	pilercr ("/EToKi-1.2/externals/pilercr") is present. 
#87 1.667 2022-05-03 20:24:06.622881	pilon ("/pilon/pilon-1.22.jar") is present. 
#87 1.668 2022-05-03 20:24:06.624263	rapidnj ("/EToKi-1.2/externals/rapidnj") is present. 
#87 1.670 2022-05-03 20:24:06.626398	raxml ("/standard-RAxML/raxmlHPC") is present. 
#87 1.671 2022-05-03 20:24:06.627700	raxml_ng ("/raxml_ng/raxml-ng") is present. 
#87 1.675 2022-05-03 20:24:06.631552	repair ("/opt/bbmap/repair.sh") is present. 
#87 1.676 2022-05-03 20:24:06.632823	samtools ("/lyve-SET/scripts/samtools") is present. 
#87 1.751 2022-05-03 20:24:06.707700	spades ("/spades/bin/spades.py") is present. 
#87 1.753 2022-05-03 20:24:06.709045	trf ("/EToKi-1.2/externals/trf409.linux64") is present. 
#87 1.754 2022-05-03 20:24:06.710205	ERROR - usearch ("/usr/local/bin/usearch") is not present. 
#87 1.754 2022-05-03 20:24:06.710254	WARNING - kraken_database is not present. 
#87 1.754 You can still use EToKi except the parameter "--kraken" in EToKi assemble will not work.
#87 1.754 Alternatively you can download minikraken2 database using --download_krakenDB or pass an pre-installed database into EToKi using --link_krakenDB.
#87 1.756 2022-05-03 20:24:06.712868	Configuration complete.
#87 DONE 1.8s

@lskatz
Copy link
Contributor Author

lskatz commented May 4, 2022

There will be some decreased functionality but hopefully it does not affect the MLST functionality. I'll get back to you when possible -- compiling it now with singularity build etoki.sif docker-daemon://etoki:1.2. I'll also fix the issues with which.

@lskatz lskatz marked this pull request as draft May 4, 2022 01:44
@lskatz lskatz marked this pull request as ready for review May 4, 2022 17:19
@lskatz
Copy link
Contributor Author

lskatz commented May 4, 2022

I unfortunately had to edit the line where usearch is called so that it fits vsearch parameters. One major change is that vsearch does not accept amino acid query/ref and so it will search in the nucleotide space instead. The initial results seem to work on the sample 7-gene MLST and so I think it's at least okay to have in the container.

But the good part is that it compiles! And the test works!

Copy link
Contributor

@erinyoung erinyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks great!

In the etoki/1.2/README.md, could you indicate that you had to change the original tool to run with vsearch?

@erinyoung erinyoung added the enhancement New feature or request label May 6, 2022
@kapsakcj kapsakcj self-requested a review May 11, 2022 13:56
@kapsakcj
Copy link
Collaborator

I'd like to take a closer look at this & test when I have some free time later this week. Looks like a beast of a docker image! Thanks for the PR, Lee

etoki/1.2/Dockerfile Outdated Show resolved Hide resolved
@lskatz
Copy link
Contributor Author

lskatz commented May 23, 2022

I think I covered the suggestions which were very helpful, thanks!

@lskatz lskatz requested a review from erinyoung May 23, 2022 23:06
@kapsakcj
Copy link
Collaborator

Was able to build successfully, gee what a beast of a docker image.

I did notice a quirk during the testing bits near the end. I see that you purposefully installed spades.py 3.15.4 via the staphb docker image, but when the etoki image is built, spades 3.15.1 is installed & priority in the PATH.

$ docker_run -ti lskatz/etoki:1.2 /bin/bash -c "spades.py --version; which spades.py"
SPAdes genome assembler v3.14.1
/spades/bin/spades.py

Any idea how that happened? Did another program install an older version of spades?

The reason I bring it up is because of this traceback during the building of the etoki image. This was supposedly fixed in spades 3.15.4

Step 112/112 : RUN bash example.bash
 ---> Running in fa152f9a8de9
2022-05-25 02:39:43.892227      Load in 2 read files from 1 libraries
2022-05-25 02:39:47.581476      Obtained 5773534 bases in 32867 reads after Trimming in Lib 0
--pe examples/prep_out_L1_R1.fastq.gz,examples/prep_out_L1_R2.fastq.gz --se examples/prep_out_L1_SE.fastq.gz
2022-05-25 02:39:48.581561      Load in 3 read files from 2 libraries
2022-05-25 02:39:48.680125      Estimated read length: 175.73541761752625
Traceback (most recent call last):
  File "/spades/bin/spades.py", line 643, in <module>
    main(sys.argv)
  File "/spades/bin/spades.py", line 583, in main
    print_params(log, log_filename, command_line, args, cfg)
  File "/spades/bin/spades.py", line 323, in print_params
    print_used_values(cfg, log)
  File "/spades/bin/spades.py", line 113, in print_used_values
    dataset_data = pyyaml.load(open(cfg["dataset"].yaml_filename))
  File "/spades/share/spades/pyyaml3/__init__.py", line 72, in load
    return loader.get_single_data()
  File "/spades/share/spades/pyyaml3/constructor.py", line 37, in get_single_data
    return self.construct_document(node)
  File "/spades/share/spades/pyyaml3/constructor.py", line 46, in construct_document
    for dummy in generator:
  File "/spades/share/spades/pyyaml3/constructor.py", line 398, in construct_yaml_map
    value = self.construct_mapping(node)
  File "/spades/share/spades/pyyaml3/constructor.py", line 204, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "/spades/share/spades/pyyaml3/constructor.py", line 126, in construct_mapping
    if not isinstance(key, collections.Hashable):
AttributeError: module 'collections' has no attribute 'Hashable'

@kapsakcj
Copy link
Collaborator

Ohhhh, Etoki is what is installing these dependencies https://github.com/zheminzhou/EToKi#installation

All 3rd party programs except for usearch can be automatically installed using configure command:

python EToKi.py configure --install --download_krakenDB

Can lines 136-156 in the dockerfile be removed? Will Etoki run fine as long as all of these tools are in the PATH?

@kapsakcj
Copy link
Collaborator

Gah, I should have continued reading the Etoki README:

You can also use pre-installed 3rd party programs in EToKi, by passing their absolute paths into the program using --path. This argument can be specified multiple times in the same command:

Nevermind. Still not sure how spades 3.14.1 got in there

@kapsakcj
Copy link
Collaborator

ok I'm going to stop looking at code late at night. Running myself in circles here.

SPAdes 3.14.1 was definitely brought in from the staphb/shovill:1.1.0 docker image. I assumed it was the latest version, but it isn't.

Might recommend upgrading to SPAdes 3.14.5 via the bespoke docker image staphb/spades:3.15.4 to avoid the traceback error I mentioned previously

Copy link
Contributor Author

@lskatz lskatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request has been addressed

etoki/1.2.1/Dockerfile Outdated Show resolved Hide resolved
etoki/1.2.1/Dockerfile Outdated Show resolved Hide resolved
Copy link
Contributor

@erinyoung erinyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking so good!

My only suggestion is to use wget with the version that you're maintaining on github instead of git clone and checkout.

@lskatz
Copy link
Contributor Author

lskatz commented Jun 9, 2022

Looks like tests passed, and I answered your comments. Thanks for catching all those mistakes or things that raise eyebrows so that it's polished.

etoki/1.2.1/Dockerfile Outdated Show resolved Hide resolved
Copy link
Contributor

@erinyoung erinyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just one small thing in your dockerfile.

Also, we don't need the test-etoki.yml anymore, so it'd be awsome if you could remove that.

@lskatz
Copy link
Contributor Author

lskatz commented Jun 18, 2022

Thanks for those comments. I think I addressed those two comments. Let me know if there is anything else I can do!

Copy link
Contributor

@erinyoung erinyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew!

@erinyoung
Copy link
Contributor

I'm going to merge this and deploy the image to dockerhub. It should be there soon

@erinyoung erinyoung merged commit 1330f74 into StaPH-B:master Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants