Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More formats #17

Merged
merged 3 commits into from
Jul 22, 2021
Merged

More formats #17

merged 3 commits into from
Jul 22, 2021

Conversation

lskatz
Copy link
Owner

@lskatz lskatz commented Jul 22, 2021

Downloading nucleotide, protein sequences

@lskatz lskatz merged commit 3637434 into master Jul 22, 2021
@lskatz lskatz deleted the more-formats branch July 22, 2021 17:04
lskatz added a commit that referenced this pull request Nov 5, 2024
* new files for individual genes and coordinates

* m

* new flag to include optional files with --and
lskatz added a commit that referenced this pull request Nov 5, 2024
* GitHub actions (#13)

* unit-testing actions

* unit-testing actions

* unit-testing actions

* unit-testing actions

* installing edirect

* installing edirect

* installing edirect

* installing edirect

* installing edirect

* rm travis

* edirect through apt

* edirect through apt

* Add files via upload

* adding taxonomy_v3.5.1

* More formats (#17)

* new files for individual genes and coordinates

* m

* new flag to include optional files with --and

* Listeria unit testing (#18)

* Listeria unit testing draft

* m

* debug

* debug

* debug

* update kalamari script; add --and flags

* kraken1 db

* m

* m

* m

* m

* editing PATH

* editing PATH

* fixing src path

* m

* fixing installation dir

* jellyfish1

* jellyfish1

* m

* just two genomes

* tree kraken

* added threads 2

* added threads 2

* build kraken -x

* work on disk in kraken

* debug

* trying out kraken2

* m

* removed rebuild and work-on-disk

* kraken report

* kraken report

* more inspection of kraken output

* more inspection of kraken output

* done with unit testing for now

Co-authored-by: Lee Katz - Aspen <[email protected]>

* new parent id

* a get taxonomy script for a reduced set of dmp files

* reduced taxonomy

* testing v3.9.2

* added parentid to plasmids

* Updating some Yersinia taxid (#16)

* Add files via upload

* adding taxonomy_v3.5.1

* adding v3.9.3 taxonomy

* m

* adding in Scott's Yersinia genomes

* cleanup

* updated to correct src tax dir

* Update unit-testing.yml

* Create CITATION.cff (#20)

* Create CITATION.cff

* Update CITATION.cff

* Kraken1 unit test (#21)

* with fixed taxonomy, unit test kraken1

* shortened the minimizer length to 9

* kraken1 query

* m

* adding a query is $query statement

Co-authored-by: Lee Katz - Aspen <[email protected]>

* Database doc update (#22)

* with fixed taxonomy, unit test kraken1

* shortened the minimizer length to 9

* kraken1 query

* m

* adding a query is $query statement

* Update DATABASES.md

* added blast and ANI instructions

* updated docs to reflect more comprehensive DATABASES.md

* m

Co-authored-by: Lee Katz - Aspen <[email protected]>

* mash database

* Define contributions (#23)

* validate taxonomy script

* unit testing for taxonomy

* unit testing for taxonomy

* moved XXXXXX entries to a todo file

* validating names.dmp and added new entries to make taxonomy more complete

* Contributing.md doc

* link to contributing.md

* more description under contributions

Co-authored-by: Lee Katz - Aspen <[email protected]>

* mmseqs2 just for fun

* m

* Sepia

* fixed bacillus genus back to bacteria in the plasmids (#24)

Co-authored-by: Lee Katz - Aspen <[email protected]>

* Build sepia (#25)

* fixed bacillus genus back to bacteria in the plasmids

* sepia building v1

* m

* sepia documentation and reference generation script

* m

Co-authored-by: Lee Katz - Aspen <[email protected]>

* fixed a bug where the same fasta file would be downloaded twice and given the parent taxid in addition to its own

* validate a kraken database better

* MIDAS

* m

* m

* Update README.md with reqs and recs (#29)

* Update chromosomes.tsv

* using GITHUB_PATH to solve CI problems

* m

* m

* limit tests to target branches

* jellyfish now in path

* m

* remove -x statement

* allow this workflow to work on master

* trying out taxonomy validator workflow

* remove kraken1 from testing on this branch

* fix path to taxonomy

* Fix ci (#31)

* using GITHUB_PATH to solve CI problems

* m

* m

* limit tests to target branches

* jellyfish now in path

* m

* remove -x statement

* allow this workflow to work on master

* trying out taxonomy validator workflow

* remove kraken1 from testing on this branch

* fix path to taxonomy

* check file sizes after pulling down accessions

* more debugging in the ci just in case

* change cryptosporidium parent taxids to cryptosporidium the genus

* marged new kalamari download script

* upped the version

* getExactTaxonomy.pl: better error messages

* downloadKalamari.pl: add in retmax 1

* only accept one sequence per insdc accession

* script to download kalamari from source

* numcpus option added; new bash script to download and format

* bash downloadKalamari.sh

* update to ubuntu 20

* 2 cpus in test

* add spreadsheet as a strategy variable

* m

* m

* split jobs between runners

* fix math

* adding more retries

* switch to 1 cpu for testing

* bump tag to v5.3.0

* std output for downloadKalamari.sh

* removed bioperl

* bump version; add more standard conda db location

* trying to speed up downloads
rd conda db location

* vast speed increase with batch downloads; cleaned up chromosomes.tsv

* moved version information to the script from Makefile.PL; removed --and; won't make kraken db in shell script

* m

* remove edirect setup unit test

* update unit tests

* just two chunks of tests

* batch more

* fix file sizes check

* just make the damn thing work

* bash file uses local repo files instead of curl; default buffer size 100

* More proper build (#42)

* Building taxonomy (#38)

* building taxonomy files but this script will be deprecated right away

* deprecated

* script to build taxonomy with src files

* m

* move old taxonomy to deprecated

* remove old 'versioned' files outside of git versioning

* filter taxonomy script

* complete the taxonomy

* updated scripts for compiling databases

* dev branch testing

* fix lmono test a bit

* .

* Fix the taxonomy tests (#39)

* building taxonomy files but this script will be deprecated right away

* deprecated

* script to build taxonomy with src files

* m

* move old taxonomy to deprecated

* remove old 'versioned' files outside of git versioning

* filter taxonomy script

* complete the taxonomy

* updated scripts for compiling databases

* dev branch testing

* fix lmono test a bit

* .

* fix paths

* updated PATH

* updated PATH

* troubleshooting

* fix PATH again

* fix ls path

* remove that step

* updated tests to reflect build-taxonomy (#40)

* fix path to taxonomy files

* download and build taxonomy

* merge Listeria into Yersinia matrix

* m

* updated output directory as matrix.GENUS

* kraken1 tests patches

* m

* Fixed two more tests (#41)

* update yml

* query fallback

* debugging msg

* fix path to taxonomydb

* print first two lines of fasta files

* helpful cut statement

* remove head statement in last step

* bump version

* fix a downloading bug where sed stalls

* update for compressed kalamari library and more efficient kraken builds

* update download script

* Validate taxonomy (#43)

* validateTaxonomy update for just taxdirs; add 1 for filtered taxonomy; added DEBUG option for downloadKalamari.sh

* updated unit tests

* updated unit tests

* remove taxonomy stuff from downloadKalamari.sh

* fix validateTaxonomy syscall

* check on filtered tax in unit test

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* init paper

* some revisions; taxonomy; downloading

* swap example

* references

* stole Joe's draft-pdf.yml

* update to version 4 of artifacts

* plasmids description

* ignore rendered manuscripts

* some minor fixes; author affiliations; code examples

* added Shatavia; updated example

* m

* revisions from Jess

* refs

* fix list that became italics

* updated Andrew's affiliation

* plasmid defined species

* gave a name to the JOSS rendering

* try experimental docx file creation

* try 2 with container

* correct artifact Action

* m

* upload artifact v4

* branch agnostic

* try multiple formats; multiple uploads

* fix some citations

* fixed Dr. Lauer's info

* remove format arg

* shatavia's orcid

* added Rebecca's and Jess's orcids

* updated DOIs

* fixed comment line

* added Entrez Edirect URL

* more Entrez citation with help from CoPilot

* Andrew's orcid

* misc

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* updated revisions from coauthors

* entered Taylor's revisiosn

* move Katie to acknowledgements due to her request

* update genome list; stable efetching (#49)

* Add genomes (#45)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* Esearch input (#47)

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* make symlink to avoid naming mistakes

* check whether taxonkit is loaded

* use efetch -input

* fix tr bug

* Esearch input flag (#48)

* Add genomes (#45) (#46)

* Corynebacterium diphtheriae

* added Bifidobacterium adolenscentis

* replaced S. enterica IIIa; Added hops (Humulus lupulus)

* added a Citrobacter species

* m

* replaced repressed genome accession for B. faecium

* remove random single quotes

* bump version

* helpful log messages

* v5.6.3

* make symlink to avoid naming mistakes

* check whether taxonkit is loaded

* use efetch -input

* fix tr bug

* get latest edirect

* update installation instructions

* update installation instructions: fix PATH

* bring in other tests

* update installation method for search with unit-testing

* update installation method for search with kraken2

* debug the ls statement

* debug the ls statement

* debug the ls statement

* debug building taxonomy

* exclusive unit testing for taxonomy for right now

* install taxonkit

* changes from cdc clearance process

* disable buggy docx creation

* fix blast+ formatting typo

* Change to MIT license

* Update README.md: remove CC license sticker

* update entrez ref

* MRA

* MRA

* misc

* 500 words or less

* nix example

* abstract

* abbreviate genera

* another paper revision

* added asm pandoc template

* provenance

* Leptospira interrogans => CP020414

* some progress

* downloadKalamari.sh: nuccleotideAcc bug fuxed

* v5.7.2

* another round of provenance

* cleared out the unknowns list

* fixed chromosomes with sources

* chromosomes

* try to run CI

* fix wildcard

* better named sources for each assembly

* polish this directory

* assembly-complete.gz

---------

Co-authored-by: Scott Nguyen <[email protected]>
Co-authored-by: Scott Nguyen <[email protected]>
Co-authored-by: Curtis Kapsak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant