Skip to content

Commit

Permalink
Added a check for input assembly size
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Sep 23, 2024
1 parent 5f16a82 commit 75b66fd
Show file tree
Hide file tree
Showing 5 changed files with 19 additions and 3 deletions.
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
17. `eggnogmapper_tax_scope` is now set to 1 (root div) by default
18. Added a `test` profile based on public data
19. Added parameter `add_attrs_to_proteins_fasta` to enable/disable addition of decoded gff attributes to proteins fasta [#58](https://github.com/plant-food-research-open/genepal/issues/58)
20. Updated modules and sub-workflows
20. Added a check for input assemblies. If an assembly is smaller than 1 MB (or 300KB in zipped format), the pipeline errors out before starting the downstream processes [#47](https://github.com/plant-food-research-open/genepal/issues/47)
21. Updated modules and sub-workflows

### `Fixed`

Expand Down
1 change: 0 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ A repeat library is created with either [REPEATMODELER](https://github.com/Dfam-

- `repeatmasker/`
- `*.masked`: Masked assembly
- `*.gff`: Repeatmasker output in GFF3 format

</details>

Expand Down
11 changes: 10 additions & 1 deletion subworkflows/local/utils_nfcore_genepal_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,16 @@ workflow PIPELINE_INITIALISATION {
def tag = it[0]
def fasta = it[1]

[ [ id: tag ], file(fasta, checkIfExists: true) ]
def fasta_file = file(fasta, checkIfExists: true)
def is_zipped = fasta.endsWith('.gz')
def sz_thresh = is_zipped ? 300_000 : 1_000_000
def fasta_size = fasta_file.size()

if ( fasta_size < sz_thresh ) { // < 1 MB
error "The assembly represented by tag '$tag' is only $fasta_size bytes. The minimum allowed size is 1 MB!"
}

[ [ id: tag ], fasta_file ]
}

ch_tar_assm_str = ch_input
Expand Down
2 changes: 2 additions & 0 deletions tests/short/assemblysheet.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tag,fasta,is_masked
sarscov2,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/genome.fasta.gz,no
5 changes: 5 additions & 0 deletions tests/short/params.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"input": "tests/short/assemblysheet.csv",
"protein_evidence": "https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/sarscov2/genome/proteome.fasta.gz",
"busco_skip": true
}

0 comments on commit 75b66fd

Please sign in to comment.