Skip to content

Commit

Permalink
Inputs can also be files of files, not just folders
Browse files Browse the repository at this point in the history
This way we follow the example of other bioinfo tools, which
allow for users to store their inout files across many directories,
which in the case of having thousands of inputs is a performance
issue. It also gives the users more flexibility.

Also added a specific unit test
  • Loading branch information
mgalardini committed May 21, 2024
1 parent c37ad01 commit e504cee
Show file tree
Hide file tree
Showing 6 changed files with 144 additions and 62 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,16 @@ among them:
* `--start -50 --stop 100 --sample 0.1`, will restrict the plot to 10% of samples and to the -50 to +100 region relative to the start codon
* adding `--nucleotides` to the above command will add the nucleotide letters to each plot

# Working with a very large dataset

**Note:** this is a new functionality introduced in v1.6.0

If you are working with more than a few thousand input files, it is poor practice to have
all the inputs in a single directory (e.g. for performance reasons). Following what
other bioinformatic tools do to solve this issue, the `--gff` and `--fasta` arguments
can also be provided as "files-of-files", where the path to each input file is written
in each line.

# Prerequisites:

The following packages and version have been used to develop and test `panfeed`
Expand Down
2 changes: 1 addition & 1 deletion panfeed/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.5.2-dev'
__version__ = '1.6.0'
10 changes: 7 additions & 3 deletions panfeed/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,9 @@ def get_options():
parser.add_argument("-g", "--gff",
required=True,
help = "Directory containing all samples' GFF "
"files (must contain nucleotide sequence as "
"files, or a file listing the relative path "
"to each GFF file, one per line "
"(must contain nucleotide sequence as "
"well unless -f is used, "
"and samples should be named in the "
"same way as in the panaroo header)")
Expand Down Expand Up @@ -118,10 +120,12 @@ def get_options():

parser.add_argument("-f", "--fasta",
help = "Directory containing all samples' nucleotide "
"fasta files (extension either .fasta "
"fasta files, or a file listing the relative "
"path to each fasta file, one per line "
"(extension either .fasta "
"or .fna, "
"samples should be named in the "
"same way as in the panaroo header")
"same way as in the panaroo header)")

parser.add_argument("-k", "--kmer-length", type = int,
default = 31,
Expand Down
Loading

0 comments on commit e504cee

Please sign in to comment.