Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md with reqs and recs #29

Merged
merged 1 commit into from
May 4, 2022
Merged

Conversation

kapsakcj
Copy link
Contributor

@kapsakcj kapsakcj commented May 3, 2022

Just some helpful tips for folks looking to get started with using this wonderful resource.

I carried/borrowed/copied these instructions from the datasets-sars-cov-2 README, which I think was from another repo orignally 😆

I'm not sure if there are any non-standard perl modules that are required for these scripts, but would be good to list those (I didn't have to install anything special)

@kapsakcj
Copy link
Contributor Author

kapsakcj commented May 3, 2022

rendered markdown can be viewed on my fork: https://github.com/kapsakcj/Kalamari#synopsis

@lskatz lskatz merged commit 9da34e6 into lskatz:master May 4, 2022
@lskatz
Copy link
Owner

lskatz commented May 4, 2022

Thank you! This is very useful!

lskatz pushed a commit that referenced this pull request Nov 5, 2024
lskatz added a commit that referenced this pull request Nov 5, 2024
* GitHub actions (#13)

* unit-testing actions

* unit-testing actions

* unit-testing actions

* unit-testing actions

* installing edirect

* installing edirect

* installing edirect

* installing edirect

* installing edirect

* rm travis

* edirect through apt

* edirect through apt

* Add files via upload

* adding taxonomy_v3.5.1

* More formats (#17)

* new files for individual genes and coordinates

* m

* new flag to include optional files with --and

* Listeria unit testing (#18)

* Listeria unit testing draft

* m

* debug

* debug

* debug

* update kalamari script; add --and flags

* kraken1 db

* m

* m

* m

* m

* editing PATH

* editing PATH

* fixing src path

* m

* fixing installation dir

* jellyfish1

* jellyfish1

* m

* just two genomes

* tree kraken

* added threads 2

* added threads 2

* build kraken -x

* work on disk in kraken

* debug

* trying out kraken2

* m

* removed rebuild and work-on-disk

* kraken report

* kraken report

* more inspection of kraken output

* more inspection of kraken output

* done with unit testing for now

Co-authored-by: Lee Katz - Aspen <[email protected]>

* new parent id

* a get taxonomy script for a reduced set of dmp files

* reduced taxonomy

* testing v3.9.2

* added parentid to plasmids

* Updating some Yersinia taxid (#16)

* Add files via upload

* adding taxonomy_v3.5.1

* adding v3.9.3 taxonomy

* m

* adding in Scott's Yersinia genomes

* cleanup

* updated to correct src tax dir

* Update unit-testing.yml

* Create CITATION.cff (#20)

* Create CITATION.cff

* Update CITATION.cff

* Kraken1 unit test (#21)

* with fixed taxonomy, unit test kraken1

* shortened the minimizer length to 9

* kraken1 query

* m

* adding a query is $query statement

Co-authored-by: Lee Katz - Aspen <[email protected]>

* Database doc update (#22)

* with fixed taxonomy, unit test kraken1

* shortened the minimizer length to 9

* kraken1 query

* m

* adding a query is $query statement

* Update DATABASES.md

* added blast and ANI instructions

* updated docs to reflect more comprehensive DATABASES.md

* m

Co-authored-by: Lee Katz - Aspen <[email protected]>

* mash database

* Define contributions (#23)

* validate taxonomy script

* unit testing for taxonomy

* unit testing for taxonomy

* moved XXXXXX entries to a todo file

* validating names.dmp and added new entries to make taxonomy more complete

* Contributing.md doc

* link to contributing.md

* more description under contributions

Co-authored-by: Lee Katz - Aspen <[email protected]>

* mmseqs2 just for fun

* m

* Sepia

* fixed bacillus genus back to bacteria in the plasmids (#24)

Co-authored-by: Lee Katz - Aspen <[email protected]>

* Build sepia (#25)

* fixed bacillus genus back to bacteria in the plasmids

* sepia building v1

* m

* sepia documentation and reference generation script

* m

Co-authored-by: Lee Katz - Aspen <[email protected]>

* fixed a bug where the same fasta file would be downloaded twice and given the parent taxid in addition to its own

* validate a kraken database better

* MIDAS

* m

* m

* Update README.md with reqs and recs (#29)

* Update chromosomes.tsv

* using GITHUB_PATH to solve CI problems

* m

* m

* limit tests to target branches

* jellyfish now in path

* m

* remove -x statement

* allow this workflow to work on master

* trying out taxonomy validator workflow

* remove kraken1 from testing on this branch

* fix path to taxonomy

* Fix ci (#31)

* using GITHUB_PATH to solve CI problems

* m

* m

* limit tests to target branches

* jellyfish now in path

* m

* remove -x statement

* allow this workflow to work on master

* trying out taxonomy validator workflow

* remove kraken1 from testing on this branch

* fix path to taxonomy

* check file sizes after pulling down accessions

* more debugging in the ci just in case

* change cryptosporidium parent taxids to cryptosporidium the genus

* marged new kalamari download script

* upped the version

* getExactTaxonomy.pl: better error messages

* downloadKalamari.pl: add in retmax 1

* only accept one sequence per insdc accession

* script to download kalamari from source

* numcpus option added; new bash script to download and format

* bash downloadKalamari.sh

* update to ubuntu 20

* 2 cpus in test

* add spreadsheet as a strategy variable

* m

* m

* split jobs between runners

* fix math

* adding more retries

* switch to 1 cpu for testing

* bump tag to v5.3.0

* std output for downloadKalamari.sh

* removed bioperl

* bump version; add more standard conda db location

* trying to speed up downloads
rd conda db location

* vast speed increase with batch downloads; cleaned up chromosomes.tsv

* moved version information to the script from Makefile.PL; removed --and; won't make kraken db in shell script

* m

* remove edirect setup unit test

* update unit tests

* just two chunks of tests

* batch more

* fix file sizes check

* just make the damn thing work

* bash file uses local repo files instead of curl; default buffer size 100

* More proper build (#42)

* Building taxonomy (#38)

* building taxonomy files but this script will be deprecated right away

* deprecated

* script to build taxonomy with src files

* m

* move old taxonomy to deprecated

* remove old 'versioned' files outside of git versioning

* filter taxonomy script

* complete the taxonomy

* updated scripts for compiling databases

* dev branch testing

* fix lmono test a bit

* .

* Fix the taxonomy tests (#39)

* building taxonomy files but this script will be deprecated right away

* deprecated

* script to build taxonomy with src files

* m

* move old taxonomy to deprecated

* remove old 'versioned' files outside of git versioning

* filter taxonomy script

* complete the taxonomy

* updated scripts for compiling databases

* dev branch testing

* fix lmono test a bit

* .

* fix paths

* updated PATH

* updated PATH

* troubleshooting

* fix PATH again

* fix ls path

* remove that step

* updated tests to reflect build-taxonomy (#40)

* fix path to taxonomy files

* download and build taxonomy

* merge Listeria into Yersinia matrix

* m

* updated output directory as matrix.GENUS

* kraken1 tests patches

* m

* Fixed two more tests (#41)

* update yml

* query fallback

* debugging msg

* fix path to taxonomydb

* print first two lines of fasta files

* helpful cut statement

* remove head statement in last step

* bump version

* fix a downloading bug where sed stalls

* update for compressed kalamari library and more efficient kraken builds

* update download script

* Validate taxonomy (#43)

* validateTaxonomy update for just taxdirs; add 1 for filtered taxonomy; added DEBUG option for downloadKalamari.sh

* updated unit tests

* updated unit tests

* remove taxonomy stuff from downloadKalamari.sh

* fix validateTaxonomy syscall

* check on filtered tax in unit test

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* init paper

* some revisions; taxonomy; downloading

* swap example

* references

* stole Joe's draft-pdf.yml

* update to version 4 of artifacts

* plasmids description

* ignore rendered manuscripts

* some minor fixes; author affiliations; code examples

* added Shatavia; updated example

* m

* revisions from Jess

* refs

* fix list that became italics

* updated Andrew's affiliation

* plasmid defined species

* gave a name to the JOSS rendering

* try experimental docx file creation

* try 2 with container

* correct artifact Action

* m

* upload artifact v4

* branch agnostic

* try multiple formats; multiple uploads

* fix some citations

* fixed Dr. Lauer's info

* remove format arg

* shatavia's orcid

* added Rebecca's and Jess's orcids

* updated DOIs

* fixed comment line

* added Entrez Edirect URL

* more Entrez citation with help from CoPilot

* Andrew's orcid

* misc

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* updated revisions from coauthors

* entered Taylor's revisiosn

* move Katie to acknowledgements due to her request

* update genome list; stable efetching (#49)

* Add genomes (#45)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* Esearch input (#47)

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* make symlink to avoid naming mistakes

* check whether taxonkit is loaded

* use efetch -input

* fix tr bug

* Esearch input flag (#48)

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* make symlink to avoid naming mistakes

* check whether taxonkit is loaded

* use efetch -input

* fix tr bug

* get latest edirect

* update installation instructions

* update installation instructions: fix PATH

* bring in other tests

* update installation method for search with unit-testing

* update installation method for search with kraken2

* debug the ls statement

* debug the ls statement

* debug the ls statement

* debug building taxonomy

* exclusive unit testing for taxonomy for right now

* install taxonkit

* changes from cdc clearance process

* disable buggy docx creation

* fix blast+ formatting typo

* Change to MIT license

* Update README.md: remove CC license sticker

* update entrez ref

* MRA

* MRA

* misc

* 500 words or less

* nix example

* abstract

* abbreviate genera

* another paper revision

* added asm pandoc template

* provenance

* Leptospira interrogans => CP020414

* some progress

* downloadKalamari.sh: nuccleotideAcc bug fuxed

* v5.7.2

* another round of provenance

* cleared out the unknowns list

* fixed chromosomes with sources

* chromosomes

* try to run CI

* fix wildcard

* better named sources for each assembly

* polish this directory

* assembly-complete.gz

---------

Co-authored-by: Scott Nguyen <[email protected]>
Co-authored-by: Scott Nguyen <[email protected]>
Co-authored-by: Curtis Kapsak <[email protected]>
lskatz added a commit that referenced this pull request Dec 31, 2024
* GitHub actions (#13)

* unit-testing actions

* unit-testing actions

* unit-testing actions

* unit-testing actions

* installing edirect

* installing edirect

* installing edirect

* installing edirect

* installing edirect

* rm travis

* edirect through apt

* edirect through apt

* Add files via upload

* adding taxonomy_v3.5.1

* More formats (#17)

* new files for individual genes and coordinates

* m

* new flag to include optional files with --and

* Listeria unit testing (#18)

* Listeria unit testing draft

* m

* debug

* debug

* debug

* update kalamari script; add --and flags

* kraken1 db

* m

* m

* m

* m

* editing PATH

* editing PATH

* fixing src path

* m

* fixing installation dir

* jellyfish1

* jellyfish1

* m

* just two genomes

* tree kraken

* added threads 2

* added threads 2

* build kraken -x

* work on disk in kraken

* debug

* trying out kraken2

* m

* removed rebuild and work-on-disk

* kraken report

* kraken report

* more inspection of kraken output

* more inspection of kraken output

* done with unit testing for now

Co-authored-by: Lee Katz - Aspen <[email protected]>

* new parent id

* a get taxonomy script for a reduced set of dmp files

* reduced taxonomy

* testing v3.9.2

* added parentid to plasmids

* Updating some Yersinia taxid (#16)

* Add files via upload

* adding taxonomy_v3.5.1

* adding v3.9.3 taxonomy

* m

* adding in Scott's Yersinia genomes

* cleanup

* updated to correct src tax dir

* Update unit-testing.yml

* Create CITATION.cff (#20)

* Create CITATION.cff

* Update CITATION.cff

* Kraken1 unit test (#21)

* with fixed taxonomy, unit test kraken1

* shortened the minimizer length to 9

* kraken1 query

* m

* adding a query is $query statement

Co-authored-by: Lee Katz - Aspen <[email protected]>

* Database doc update (#22)

* with fixed taxonomy, unit test kraken1

* shortened the minimizer length to 9

* kraken1 query

* m

* adding a query is $query statement

* Update DATABASES.md

* added blast and ANI instructions

* updated docs to reflect more comprehensive DATABASES.md

* m

Co-authored-by: Lee Katz - Aspen <[email protected]>

* mash database

* Define contributions (#23)

* validate taxonomy script

* unit testing for taxonomy

* unit testing for taxonomy

* moved XXXXXX entries to a todo file

* validating names.dmp and added new entries to make taxonomy more complete

* Contributing.md doc

* link to contributing.md

* more description under contributions

Co-authored-by: Lee Katz - Aspen <[email protected]>

* mmseqs2 just for fun

* m

* Sepia

* fixed bacillus genus back to bacteria in the plasmids (#24)

Co-authored-by: Lee Katz - Aspen <[email protected]>

* Build sepia (#25)

* fixed bacillus genus back to bacteria in the plasmids

* sepia building v1

* m

* sepia documentation and reference generation script

* m

Co-authored-by: Lee Katz - Aspen <[email protected]>

* fixed a bug where the same fasta file would be downloaded twice and given the parent taxid in addition to its own

* validate a kraken database better

* MIDAS

* m

* m

* Update README.md with reqs and recs (#29)

* Update chromosomes.tsv

* using GITHUB_PATH to solve CI problems

* m

* m

* limit tests to target branches

* jellyfish now in path

* m

* remove -x statement

* allow this workflow to work on master

* trying out taxonomy validator workflow

* remove kraken1 from testing on this branch

* fix path to taxonomy

* Fix ci (#31)

* using GITHUB_PATH to solve CI problems

* m

* m

* limit tests to target branches

* jellyfish now in path

* m

* remove -x statement

* allow this workflow to work on master

* trying out taxonomy validator workflow

* remove kraken1 from testing on this branch

* fix path to taxonomy

* check file sizes after pulling down accessions

* more debugging in the ci just in case

* change cryptosporidium parent taxids to cryptosporidium the genus

* marged new kalamari download script

* upped the version

* getExactTaxonomy.pl: better error messages

* downloadKalamari.pl: add in retmax 1

* only accept one sequence per insdc accession

* script to download kalamari from source

* numcpus option added; new bash script to download and format

* bash downloadKalamari.sh

* update to ubuntu 20

* 2 cpus in test

* add spreadsheet as a strategy variable

* m

* m

* split jobs between runners

* fix math

* adding more retries

* switch to 1 cpu for testing

* bump tag to v5.3.0

* std output for downloadKalamari.sh

* removed bioperl

* bump version; add more standard conda db location

* trying to speed up downloads
rd conda db location

* vast speed increase with batch downloads; cleaned up chromosomes.tsv

* moved version information to the script from Makefile.PL; removed --and; won't make kraken db in shell script

* m

* remove edirect setup unit test

* update unit tests

* just two chunks of tests

* batch more

* fix file sizes check

* just make the damn thing work

* bash file uses local repo files instead of curl; default buffer size 100

* More proper build (#42)

* Building taxonomy (#38)

* building taxonomy files but this script will be deprecated right away

* deprecated

* script to build taxonomy with src files

* m

* move old taxonomy to deprecated

* remove old 'versioned' files outside of git versioning

* filter taxonomy script

* complete the taxonomy

* updated scripts for compiling databases

* dev branch testing

* fix lmono test a bit

* .

* Fix the taxonomy tests (#39)

* building taxonomy files but this script will be deprecated right away

* deprecated

* script to build taxonomy with src files

* m

* move old taxonomy to deprecated

* remove old 'versioned' files outside of git versioning

* filter taxonomy script

* complete the taxonomy

* updated scripts for compiling databases

* dev branch testing

* fix lmono test a bit

* .

* fix paths

* updated PATH

* updated PATH

* troubleshooting

* fix PATH again

* fix ls path

* remove that step

* updated tests to reflect build-taxonomy (#40)

* fix path to taxonomy files

* download and build taxonomy

* merge Listeria into Yersinia matrix

* m

* updated output directory as matrix.GENUS

* kraken1 tests patches

* m

* Fixed two more tests (#41)

* update yml

* query fallback

* debugging msg

* fix path to taxonomydb

* print first two lines of fasta files

* helpful cut statement

* remove head statement in last step

* bump version

* fix a downloading bug where sed stalls

* update for compressed kalamari library and more efficient kraken builds

* update download script

* Validate taxonomy (#43)

* validateTaxonomy update for just taxdirs; add 1 for filtered taxonomy; added DEBUG option for downloadKalamari.sh

* updated unit tests

* updated unit tests

* remove taxonomy stuff from downloadKalamari.sh

* fix validateTaxonomy syscall

* check on filtered tax in unit test

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* init paper

* init paper

* some revisions; taxonomy; downloading

* some revisions; taxonomy; downloading

* swap example

* swap example

* references

* references

* stole Joe's draft-pdf.yml

* stole Joe's draft-pdf.yml

* update to version 4 of artifacts

* update to version 4 of artifacts

* plasmids description

* plasmids description

* ignore rendered manuscripts

* ignore rendered manuscripts

* some minor fixes; author affiliations; code examples

* some minor fixes; author affiliations; code examples

* added Shatavia; updated example

* added Shatavia; updated example

* m

* m

* revisions from Jess

* revisions from Jess

* refs

* refs

* fix list that became italics

* fix list that became italics

* updated Andrew's affiliation

* updated Andrew's affiliation

* plasmid defined species

* plasmid defined species

* gave a name to the JOSS rendering

* gave a name to the JOSS rendering

* try experimental docx file creation

* try experimental docx file creation

* try 2 with container

* try 2 with container

* correct artifact Action

* correct artifact Action

* m

* m

* upload artifact v4

* upload artifact v4

* branch agnostic

* branch agnostic

* try multiple formats; multiple uploads

* try multiple formats; multiple uploads

* fix some citations

* fix some citations

* fixed Dr. Lauer's info

* fixed Dr. Lauer's info

* remove format arg

* remove format arg

* shatavia's orcid

* shatavia's orcid

* added Rebecca's and Jess's orcids

* added Rebecca's and Jess's orcids

* updated DOIs

* updated DOIs

* fixed comment line

* fixed comment line

* added Entrez Edirect URL

* added Entrez Edirect URL

* more Entrez citation with help from CoPilot

* more Entrez citation with help from CoPilot

* Andrew's orcid

* Andrew's orcid

* misc

* misc

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* updated revisions from coauthors

* updated revisions from coauthors

* entered Taylor's revisiosn

* entered Taylor's revisiosn

* move Katie to acknowledgements due to her request

* move Katie to acknowledgements due to her request

* update genome list; stable efetching (#49)

* Add genomes (#45)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* Esearch input (#47)

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* make symlink to avoid naming mistakes

* check whether taxonkit is loaded

* use efetch -input

* fix tr bug

* Esearch input flag (#48)

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* make symlink to avoid naming mistakes

* check whether taxonkit is loaded

* use efetch -input

* fix tr bug

* get latest edirect

* update installation instructions

* update installation instructions: fix PATH

* bring in other tests

* update installation method for search with unit-testing

* update installation method for search with kraken2

* debug the ls statement

* debug the ls statement

* debug the ls statement

* debug building taxonomy

* exclusive unit testing for taxonomy for right now

* install taxonkit

* changes from cdc clearance process

* changes from cdc clearance process

* disable buggy docx creation

* disable buggy docx creation

* fix blast+ formatting typo

* fix blast+ formatting typo

* Change to MIT license

* Update README.md: remove CC license sticker

* update entrez ref

* update entrez ref

* MRA

* MRA

* MRA

* MRA

* misc

* misc

* 500 words or less

* 500 words or less

* nix example

* nix example

* abstract

* abstract

* abbreviate genera

* abbreviate genera

* another paper revision

* another paper revision

* added asm pandoc template

* added asm pandoc template

* provenance

* Leptospira interrogans => CP020414

* some progress

* downloadKalamari.sh: nuccleotideAcc bug fuxed

* v5.7.2

* another round of provenance

* cleared out the unknowns list

* fixed chromosomes with sources

* chromosomes

* try to run CI

* fix wildcard

* better named sources for each assembly

* polish this directory

* assembly-complete.gz

* taylor's corrected orcid

* revert back to pdf of joss paper instead of MRA

* merge

---------

Co-authored-by: Scott Nguyen <[email protected]>
Co-authored-by: Scott Nguyen <[email protected]>
Co-authored-by: Curtis Kapsak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants