Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 0.6 #112

Merged
merged 33 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
40538b3
Merge remote-tracking branch 'origin/main' into dev
muffato Aug 22, 2024
12a7557
Version bump
muffato May 13, 2024
bd2334e
Directly query the NCBI API
muffato May 13, 2024
da7faf9
bugfix: Need all lineage names, not just eukaryotes
muffato May 28, 2024
9c5b412
Adding a parameter to choose the busco lineages
muffato May 13, 2024
cb89c3b
Removed the legacy yaml configuration file
muffato May 18, 2024
d08ba18
Generate a yaml file internally
muffato May 18, 2024
4e79fb0
Hide the CSV from the rest of the pipeline
muffato May 18, 2024
4c7cef6
Generate a more complete yaml to match the one we get from blobtoolkit
muffato May 20, 2024
eb87193
Update the database paths in the final meta.json
muffato May 20, 2024
f4c82fb
Fill in the reads too
muffato May 20, 2024
a28690f
Fill in the assembly information too
muffato May 20, 2024
a514f2f
No need to generate the reference initial yaml file
muffato May 20, 2024
df6d7d8
Switched to the newer endpoint
muffato May 21, 2024
e190bdd
Introduced --parameters to have flexibility regarding their order
muffato May 24, 2024
c37044c
Adjust the taxon_id to make sure it exists in the NT database
muffato May 24, 2024
9963aa5
All these parameters are mandatory
muffato May 28, 2024
e959c04
bugfix: --busco is optional
muffato May 31, 2024
ed94111
bugfix: this should be a "path" so that the file is made available to…
muffato May 31, 2024
5962034
bugfix: accept older assembly versions
muffato Jun 4, 2024
121e372
These fields can be missing
muffato Jun 4, 2024
8085a74
Some genomes don't have organelles
muffato Jun 4, 2024
30af129
Easier to read
muffato Jul 10, 2024
9dcd338
Release name
muffato Jul 10, 2024
8886dba
https://ncbiinsights.ncbi.nlm.nih.gov/2024/06/04/changes-ncbi-taxonom…
muffato Jun 21, 2024
70f961c
Use GoaT in addition to the NCBI because GoaT also has the freshest E…
muffato Jul 2, 2024
e9d3a64
Added an option to skip filtering hits from the same species
muffato Aug 22, 2024
8c70c77
Corrected the version as I want to be sure this is really ready for t…
muffato Aug 23, 2024
18d2daf
Merge pull request #97 from sanger-tol/draft_assemblies
muffato Aug 24, 2024
152b773
Introduced the use_work_dir_as_temp parameter like in the other pipel…
muffato Sep 11, 2024
3ddee53
For large genomes, don't run Busco off /tmp
muffato Sep 11, 2024
5a581b1
Getting ready for the release
muffato Sep 11, 2024
ca10b9b
Merge pull request #111 from sanger-tol/busco_tmp
muffato Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,39 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## – Bellsprout – []

The pipeline has now been validated for draft (unpublished) assemblies.

- The pipeline now queries the NCBI database instead of GoaT to establish the
taxonomic classification of the species and the relevant Busco lineages.
In case the taxon_id is not found, the pipeline falls back to GoaT, which
is aware of upcoming taxon_ids in ENA.
- New `--busco_lineages` parameter to choose specific Busco lineages instead of
automatically selecting based on the taxonomy.
- All parameters are now passed the regular Nextflow way. There is no support
for the original Yaml configuration files of the Snakemake version.
- New option `--skip_taxon_filtering` to skip the taxon filtering in blast searches.
Mostly relevant for draft assemblies.

### Parameters

| Old parameter | New parameter |
| ------------- | ---------------------- |
| --yaml | |
| | --busco_lineages |
| | --skip_taxon_filtering |

> **NB:** Parameter has been **updated** if both old and new parameter information is present. </br> **NB:** Parameter has been **added** if just the new parameter information is present. </br> **NB:** Parameter has been **removed** if new parameter information isn't present.

### Software dependencies

Note, since the pipeline is using Nextflow DSL2, each process will be run with its own [Biocontainer](https://biocontainers.pro/#/registry). This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only `Docker` or `Singularity` containers are supported, `conda` is not supported.

| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| goat | 0.2.5 | |

## [[0.5.1](https://github.com/sanger-tol/blobtoolkit/releases/tag/0.5.1)] – Snorlax (patch 1) – [2024-08-22]

### Enhancements & fixes
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ It takes a samplesheet of BAM/CRAM/FASTQ/FASTA files as input, calculates genome

1. Calculate genome statistics in windows ([`fastawindows`](https://github.com/tolkit/fasta_windows))
2. Calculate Coverage ([`blobtk/depth`](https://github.com/blobtoolkit/blobtk))
3. Fetch associated BUSCO lineages ([`goat/taxonsearch`](https://github.com/genomehubs/goat-cli))
3. Determine the appropriate BUSCO lineages from the taxonomy.
4. Run BUSCO ([`busco`](https://busco.ezlab.org/))
5. Extract BUSCO genes ([`blobtoolkit/extractbuscos`](https://github.com/blobtoolkit/blobtoolkit))
6. Run Diamond BLASTp against extracted BUSCO genes ([`diamond/blastp`](https://github.com/bbuchfink/diamond))
Expand Down
166 changes: 166 additions & 0 deletions assets/mapping_taxids-busco_dataset_name.2019-12-16.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
422676 aconoidasida
7898 actinopterygii
5338 agaricales
155619 agaricomycetes
33630 alveolata
5794 apicomplexa
6854 arachnida
6656 arthropoda
4890 ascomycota
8782 aves
5204 basidiomycota
68889 boletales
3699 brassicales
134362 capnodiales
33554 carnivora
91561 cetartiodactyla
34395 chaetothyriales
3041 chlorophyta
5796 coccidia
28738 cyprinodontiformes
7147 diptera
147541 dothideomycetes
3193 embryophyta
33392 endopterygota
314146 euarchontoglires
33682 euglenozoa
2759 eukaryota
5042 eurotiales
147545 eurotiomycetes
9347 eutheria
72025 fabales
4751 fungi
314147 glires
1028384 glomerellales
5178 helotiales
7524 hemiptera
7399 hymenoptera
5125 hypocreales
50557 insecta
314145 laurasiatheria
147548 leotiomycetes
7088 lepidoptera
4447 liliopsida
40674 mammalia
33208 metazoa
6029 microsporidia
6447 mollusca
4827 mucorales
1913637 mucoromycota
6231 nematoda
33183 onygenales
9126 passeriformes
5820 plasmodium
92860 pleosporales
38820 poales
5303 polyporales
9443 primates
4891 saccharomycetes
8457 sauropsida
4069 solanales
147550 sordariomycetes
33634 stramenopiles
32523 tetrapoda
155616 tremellomycetes
7742 vertebrata
33090 viridiplantae
71240 eudicots
57723 acidobacteria
201174 actinobacteria_phylum
1760 actinobacteria_class
28211 alphaproteobacteria
135622 alteromonadales
200783 aquificae
1385 bacillales
91061 bacilli
2 bacteria
171549 bacteroidales
976 bacteroidetes
68336 bacteroidetes-chlorobi_group
200643 bacteroidia
28216 betaproteobacteria
80840 burkholderiales
213849 campylobacterales
1706369 cellvibrionales
204428 chlamydiae
1090 chlorobi
200795 chloroflexi
135613 chromatiales
1118 chroococcales
186801 clostridia
186802 clostridiales
84999 coriobacteriales
84998 coriobacteriia
85007 corynebacteriales
1117 cyanobacteria
768507 cytophagales
768503 cytophagia
68525 delta-epsilon-subdivisions
28221 deltaproteobacteria
213118 desulfobacterales
213115 desulfovibrionales
69541 desulfuromonadales
91347 enterobacterales
186328 entomoplasmatales
29547 epsilonproteobacteria
1239 firmicutes
200644 flavobacteriales
117743 flavobacteriia
32066 fusobacteria
203491 fusobacteriales
1236 gammaproteobacteria
186826 lactobacillales
118969 legionellales
85006 micrococcales
31969 mollicutes
2085 mycoplasmatales
206351 neisseriales
32003 nitrosomonadales
1161 nostocales
135619 oceanospirillales
1150 oscillatoriales
135625 pasteurellales
203682 planctomycetes
85009 propionibacteriales
1224 proteobacteria
72274 pseudomonadales
356 rhizobiales
227290 rhizobium-agrobacterium_group
204455 rhodobacterales
204441 rhodospirillales
766 rickettsiales
909929 selenomonadales
117747 sphingobacteriia
204457 sphingomonadales
136 spirochaetales
203691 spirochaetes
203692 spirochaetia
85011 streptomycetales
85012 streptosporangiales
1890424 synechococcales
508458 synergistetes
544448 tenericutes
68295 thermoanaerobacterales
200918 thermotogae
72273 thiotrichales
1737405 tissierellales
1737404 tissierellia
74201 verrucomicrobia
135623 vibrionales
135614 xanthomonadales
2157 archaea
2266 thermoproteales
2281 sulfolobales
114380 desulfurococcales
183967 thermoplasmata
651137 thaumarchaeota
2182 methanococcales
2191 methanomicrobiales
183925 methanobacteria
183924 thermoprotei
2235 halobacteriales
1644060 natrialbales
224756 methanomicrobia
1644055 haloferacales
183963 halobacteria
28890 euryarchaeota
Binary file not shown.
Binary file not shown.
Loading
Loading