Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create README details tools, versions, and parameters #14

Closed
aclum opened this issue Jan 11, 2023 · 19 comments
Closed

create README details tools, versions, and parameters #14

aclum opened this issue Jan 11, 2023 · 19 comments
Assignees

Comments

@aclum
Copy link
Contributor

aclum commented Jan 11, 2023

It has been determined that the sequencing workflows output a human readable description of workflow. ReadbasedAnalysis is higher priority b/c we'll need to run it on datasets for GSP. FYI @ssarrafan

@ssarrafan
Copy link

@aclum @poeli @hubin-keio is this work planned for this week or next sprint?

@poeli
Copy link
Collaborator

poeli commented Jan 12, 2023

@aclum @ssarrafan I am confused about "human readable description". Examples and/or scenarios will help me understand what to implement exactly. I chatted with @hubin-keio today and will have more discussions.

@aclum
Copy link
Contributor Author

aclum commented Jan 17, 2023

example IMG annotation methods:
cat *imgap.info

IMGAP Version: 5.1.13
Structural Annotation Programs Used: GeneMark.hmm-2 v1.25_lic; INFERNAL 1.1.3 (Nov 2019); Prodigal v2.6.3
Structural Annotation DBs Used: Rfam 13.0
Functional Annotation Programs Used: HMMER 3.1b2; lastal 1256
Functional Annotation DBs Used: COG 2003; Cath-Funfam v4.2.0; IMG-NR 20211118; Pfam v34.0; SMART 01_06_2016; SuperFamily v1.75; TIGRFAM v15.0

The make_info_file task in https://github.com/microbiomedata/mg_annotation/blob/a8c172beeb4ce93e8f8373c11e348181ade47e79/annotation_full.wdl is how I've implemented generating this file for the annotation workflow.

example metatranscriptome assembly methods:
"The readset was assembled with megahit version v1.2.9(1). This was run using the following command line options: megahit -t 16
--k-list 23,43,63,83,103,123 -m 100000000000 -o out.megahit --12 reads.input.fastq.gz.

The input read set was mapped to the final assembly and coverage information generated
with bbmap version 38.86(2). This was run using the following command line
options: bbmap.sh build=1 overwrite=true fastareadlen=500 -Xmx100g threads=16 nodisk=true
interleaved=true ambiguous=random rgid=filename in=reads.fastq.gz ref=reference.fasta
out=pairedMapped.bam.

(1) MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly
via succinct de Bruijn graph. Bioinformatics, 2015.
(2) B. Bushnell: BBTools software package, http://bbtools.jgi.doe.gov/
"

@ssarrafan
Copy link

@poeli and @aclum any update on this? Can this issue be closed? Is it actively being worked on?

@aclum
Copy link
Contributor Author

aclum commented Jan 28, 2023

This request is for a readme for reads based analysis, what I saw being worked on was the reads qc process. We need both but the reads based analysis is higher priority because that is the main workflow we want to run on Bioscales and GROW for GSP. @hubin-keio

@aclum aclum added the GSP2023 Add to any issue related to GSP 2023 goals label Jan 28, 2023
@poeli
Copy link
Collaborator

poeli commented Jan 30, 2023

@ssarrafan @aclum I committed the updated version c91abd1 to the development branch. The changes include:

  • A readme for the versions of profilers and databases is included in the output and saved to the file [outdir]/profiler.info.

Additional output example:

{ "ReadbasedAnalysis.info_file": "test/output/profiler.info",
  "ReadbasedAnalysis.info": "Taxonomy profiling tools and databases used:\nKraken2 v2.1.2 (database version: k2_standard_08gb_20221209)\nCentrifuge v1.0.4 (database version: RS_bahv_compressed_201612)"
}
  • A new file, db_ver.info, need to be added to each database directory.
  • New profiler tool SingleM has been added to the WDL.

@aclum
Copy link
Contributor Author

aclum commented Jan 30, 2023

Great thanks

@poeli
Copy link
Collaborator

poeli commented Jan 30, 2023

@aclum @ssarrafan I don't have permission to write centrifuge database directory. Please help move centrifuge's db_ver.info.
mv /global/cfs/projectdirs/m3408/aim2/database/db_ver.info /global/cfs/projectdirs/m3408/aim2/database/centrifuge

@aclum
Copy link
Contributor Author

aclum commented Jan 30, 2023

@poeli is this still a problem? I see the file there.

@ssarrafan
Copy link

Moving to current sprint. Please remove from sprint if you're not actively working on this.

@ssarrafan
Copy link

Closing this per @aclum

@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in GSP 2023 Board Feb 9, 2023
@aclum aclum reopened this Oct 16, 2023
@aclum
Copy link
Contributor Author

aclum commented Oct 16, 2023

@poeli @hubin-keio Gottcha2 still does not list a version. Please update this.
ie
nmdc_wfrbt-11-vq06gn88.1_profiler.info
Taxonomy profiling tools and databases used:
Kraken2 v2.1.2 (database version: Refseq: bacteria, archaea, viral, human 2020/01)
Centrifuge v1.0.4 (database version: Refseq: bacteria, archaea (compressed) 2018/04)
Gottcha2 v (database version: RefSeq-r90 Bacteria Archaea Viruses (complete genomes))

@poeli
Copy link
Collaborator

poeli commented Oct 18, 2023

@aclum The issue has been resolved (output buffer didn't flush). I will proceed to rebuild the container and initiate the testing process.

@poeli
Copy link
Collaborator

poeli commented Oct 25, 2023

@aclum New docker container has been built and tested. I don't have access to NMDC dockerhub, so I pushed to my account docker. Please let me know if you have any questions or additional issues.

@aclum
Copy link
Contributor Author

aclum commented Oct 27, 2023

@poeli I pulled your image and pushed it back to the nmdc repo
https://hub.docker.com/layers/microbiomedata/nmdc_taxa_profilers/1.0.5/images/sha256-808a5194b42503d50b93c06a6f4dd5ab83fdc85453872abae91e653a8a2c26c6?context=explore so you can reference it in the workflow.

@poeli
Copy link
Collaborator

poeli commented Nov 1, 2023

Updated to the master branch

@aclum
Copy link
Contributor Author

aclum commented Nov 3, 2023

@Michal-Babins @mbthornton-lbl We'll need a new release of https://github.com/microbiomedata/ReadbasedAnalysis repo and the nmdc automation repo needs to be updated to use this new version. With this fix the expected behavior is that the
*_profiler.info file has a version number for Gottcha2 populated.

@Michal-Babins
Copy link

Assets were updated to include the updated version. v1.0.5 is the current release and it now correctly reflected by the ReadbasedAnalysis.wdl and bundle.zip. Moving forward, we need to make sure any changes made to any branch, when merged to master reflect a major, minor, or patch update with the changes.

@Michal-Babins
Copy link

@poeli, does this version of gottcha2 not write out ${prefix}.full.tsv or only if nothing is found? I am seeing some workflows fail because cromwell is unable to fine ${prefix}.full.tsv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

6 participants