create README details tools, versions, and parameters #14

aclum · 2023-01-11T19:28:46Z

It has been determined that the sequencing workflows output a human readable description of workflow. ReadbasedAnalysis is higher priority b/c we'll need to run it on datasets for GSP. FYI @ssarrafan

ssarrafan · 2023-01-11T20:39:14Z

@aclum @poeli @hubin-keio is this work planned for this week or next sprint?

poeli · 2023-01-12T21:52:56Z

@aclum @ssarrafan I am confused about "human readable description". Examples and/or scenarios will help me understand what to implement exactly. I chatted with @hubin-keio today and will have more discussions.

aclum · 2023-01-17T22:50:15Z

example IMG annotation methods:
cat *imgap.info

IMGAP Version: 5.1.13
Structural Annotation Programs Used: GeneMark.hmm-2 v1.25_lic; INFERNAL 1.1.3 (Nov 2019); Prodigal v2.6.3
Structural Annotation DBs Used: Rfam 13.0
Functional Annotation Programs Used: HMMER 3.1b2; lastal 1256
Functional Annotation DBs Used: COG 2003; Cath-Funfam v4.2.0; IMG-NR 20211118; Pfam v34.0; SMART 01_06_2016; SuperFamily v1.75; TIGRFAM v15.0

The make_info_file task in https://github.com/microbiomedata/mg_annotation/blob/a8c172beeb4ce93e8f8373c11e348181ade47e79/annotation_full.wdl is how I've implemented generating this file for the annotation workflow.

example metatranscriptome assembly methods:
"The readset was assembled with megahit version v1.2.9(1). This was run using the following command line options: megahit -t 16
--k-list 23,43,63,83,103,123 -m 100000000000 -o out.megahit --12 reads.input.fastq.gz.

The input read set was mapped to the final assembly and coverage information generated
with bbmap version 38.86(2). This was run using the following command line
options: bbmap.sh build=1 overwrite=true fastareadlen=500 -Xmx100g threads=16 nodisk=true
interleaved=true ambiguous=random rgid=filename in=reads.fastq.gz ref=reference.fasta
out=pairedMapped.bam.

(1) MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly
via succinct de Bruijn graph. Bioinformatics, 2015.
(2) B. Bushnell: BBTools software package, http://bbtools.jgi.doe.gov/
"

ssarrafan · 2023-01-27T20:29:20Z

@poeli and @aclum any update on this? Can this issue be closed? Is it actively being worked on?

aclum · 2023-01-28T00:12:23Z

This request is for a readme for reads based analysis, what I saw being worked on was the reads qc process. We need both but the reads based analysis is higher priority because that is the main workflow we want to run on Bioscales and GROW for GSP. @hubin-keio

poeli · 2023-01-30T06:13:52Z

@ssarrafan @aclum I committed the updated version c91abd1 to the development branch. The changes include:

A readme for the versions of profilers and databases is included in the output and saved to the file [outdir]/profiler.info.

Additional output example:

{ "ReadbasedAnalysis.info_file": "test/output/profiler.info",
  "ReadbasedAnalysis.info": "Taxonomy profiling tools and databases used:\nKraken2 v2.1.2 (database version: k2_standard_08gb_20221209)\nCentrifuge v1.0.4 (database version: RS_bahv_compressed_201612)"
}

A new file, db_ver.info, need to be added to each database directory.
New profiler tool SingleM has been added to the WDL.

aclum · 2023-01-30T18:18:56Z

Great thanks

poeli · 2023-01-30T18:45:30Z

@aclum @ssarrafan I don't have permission to write centrifuge database directory. Please help move centrifuge's db_ver.info.
mv /global/cfs/projectdirs/m3408/aim2/database/db_ver.info /global/cfs/projectdirs/m3408/aim2/database/centrifuge

aclum · 2023-01-30T19:21:23Z

@poeli is this still a problem? I see the file there.

ssarrafan · 2023-01-30T22:01:45Z

Moving to current sprint. Please remove from sprint if you're not actively working on this.

ssarrafan · 2023-02-09T22:54:35Z

Closing this per @aclum

aclum · 2023-10-16T21:29:14Z

@poeli @hubin-keio Gottcha2 still does not list a version. Please update this.
ie
nmdc_wfrbt-11-vq06gn88.1_profiler.info
Taxonomy profiling tools and databases used:
Kraken2 v2.1.2 (database version: Refseq: bacteria, archaea, viral, human 2020/01)
Centrifuge v1.0.4 (database version: Refseq: bacteria, archaea (compressed) 2018/04)
Gottcha2 v (database version: RefSeq-r90 Bacteria Archaea Viruses (complete genomes))

poeli · 2023-10-18T16:31:57Z

@aclum The issue has been resolved (output buffer didn't flush). I will proceed to rebuild the container and initiate the testing process.

poeli · 2023-10-25T19:11:14Z

@aclum New docker container has been built and tested. I don't have access to NMDC dockerhub, so I pushed to my account docker. Please let me know if you have any questions or additional issues.

aclum · 2023-10-27T23:19:13Z

@poeli I pulled your image and pushed it back to the nmdc repo
https://hub.docker.com/layers/microbiomedata/nmdc_taxa_profilers/1.0.5/images/sha256-808a5194b42503d50b93c06a6f4dd5ab83fdc85453872abae91e653a8a2c26c6?context=explore so you can reference it in the workflow.

poeli · 2023-11-01T22:37:46Z

Updated to the master branch

aclum · 2023-11-03T21:44:55Z

@Michal-Babins @mbthornton-lbl We'll need a new release of https://github.com/microbiomedata/ReadbasedAnalysis repo and the nmdc automation repo needs to be updated to use this new version. With this fix the expected behavior is that the
*_profiler.info file has a version number for Gottcha2 populated.

Michal-Babins · 2023-11-08T20:43:14Z

Assets were updated to include the updated version. v1.0.5 is the current release and it now correctly reflected by the ReadbasedAnalysis.wdl and bundle.zip. Moving forward, we need to make sure any changes made to any branch, when merged to master reflect a major, minor, or patch update with the changes.

Michal-Babins · 2023-11-10T16:07:39Z

@poeli, does this version of gottcha2 not write out ${prefix}.full.tsv or only if nothing is found? I am seeing some workflows fail because cromwell is unable to fine ${prefix}.full.tsv

aclum assigned hubin-keio Jan 11, 2023

hubin-keio assigned poeli Jan 11, 2023

ssarrafan added this to 2023 Squad Sprint 2: January 16 - January 27, 2023 Jan 12, 2023

ssarrafan moved this to Todo in 2023 Squad Sprint 2: January 16 - January 27, 2023 Jan 13, 2023

poeli moved this from Todo to In Progress in 2023 Squad Sprint 2: January 16 - January 27, 2023 Jan 18, 2023

poeli moved this from In Progress to Pending Review in 2023 Squad Sprint 2: January 16 - January 27, 2023 Jan 26, 2023

aclum added the GSP2023 Add to any issue related to GSP 2023 goals label Jan 28, 2023

aclum added this to GSP 2023 Board Jan 28, 2023

ssarrafan added this to 2023 Squad Sprint 3: January 30 - February 10, 2023 Jan 30, 2023

ssarrafan removed this from 2023 Squad Sprint 2: January 16 - January 27, 2023 Jan 30, 2023

ssarrafan moved this to 🏗 In progress in GSP 2023 Board Jan 30, 2023

ssarrafan moved this to In Progress in 2023 Squad Sprint 3: January 30 - February 10, 2023 Jan 30, 2023

ssarrafan closed this as completed Feb 9, 2023

github-project-automation bot moved this from 🏗 In progress to ✅ Done in GSP 2023 Board Feb 9, 2023

github-project-automation bot moved this from In Progress to Done in 2023 Squad Sprint 3: January 30 - February 10, 2023 Feb 9, 2023

aclum reopened this Oct 16, 2023

aclum assigned Michal-Babins and mbthornton-lbl and unassigned poeli and hubin-keio Nov 3, 2023

aclum removed the GSP2023 Add to any issue related to GSP 2023 goals label Nov 3, 2023

aclum added this to 2023 Squad Sprint 23: November 6 - November 17, 2023 Nov 3, 2023

ssarrafan moved this to Todo in 2023 Squad Sprint 23: November 6 - November 17, 2023 Nov 3, 2023

mbthornton-lbl moved this from Todo to In Progress in 2023 Squad Sprint 23: November 6 - November 17, 2023 Nov 8, 2023

Michal-Babins closed this as completed Nov 8, 2023

github-project-automation bot moved this from In Progress to Done in 2023 Squad Sprint 23: November 6 - November 17, 2023 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create README details tools, versions, and parameters #14

create README details tools, versions, and parameters #14

aclum commented Jan 11, 2023 •

edited

Loading

ssarrafan commented Jan 11, 2023

poeli commented Jan 12, 2023

aclum commented Jan 17, 2023 •

edited

Loading

ssarrafan commented Jan 27, 2023

aclum commented Jan 28, 2023

poeli commented Jan 30, 2023 •

edited

Loading

aclum commented Jan 30, 2023

poeli commented Jan 30, 2023

aclum commented Jan 30, 2023

ssarrafan commented Jan 30, 2023

ssarrafan commented Feb 9, 2023

aclum commented Oct 16, 2023

poeli commented Oct 18, 2023

poeli commented Oct 25, 2023

aclum commented Oct 27, 2023

poeli commented Nov 1, 2023

aclum commented Nov 3, 2023

Michal-Babins commented Nov 8, 2023

Michal-Babins commented Nov 10, 2023

create README details tools, versions, and parameters #14

create README details tools, versions, and parameters #14

Comments

aclum commented Jan 11, 2023 • edited Loading

ssarrafan commented Jan 11, 2023

poeli commented Jan 12, 2023

aclum commented Jan 17, 2023 • edited Loading

ssarrafan commented Jan 27, 2023

aclum commented Jan 28, 2023

poeli commented Jan 30, 2023 • edited Loading

aclum commented Jan 30, 2023

poeli commented Jan 30, 2023

aclum commented Jan 30, 2023

ssarrafan commented Jan 30, 2023

ssarrafan commented Feb 9, 2023

aclum commented Oct 16, 2023

poeli commented Oct 18, 2023

poeli commented Oct 25, 2023

aclum commented Oct 27, 2023

poeli commented Nov 1, 2023

aclum commented Nov 3, 2023

Michal-Babins commented Nov 8, 2023

Michal-Babins commented Nov 10, 2023

aclum commented Jan 11, 2023 •

edited

Loading

aclum commented Jan 17, 2023 •

edited

Loading

poeli commented Jan 30, 2023 •

edited

Loading