Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a rust-toolchain file. #15

Merged
merged 1 commit into from
Oct 19, 2022
Merged

Add a rust-toolchain file. #15

merged 1 commit into from
Oct 19, 2022

Conversation

tfenne
Copy link
Member

@tfenne tfenne commented Oct 19, 2022

Wouldn't build on my machine, which was defaulting to an older rust version, due to a feature being stabilized that you rely on.

@tfenne tfenne requested a review from nh13 October 19, 2022 20:45
@tfenne tfenne merged commit 09190e5 into refactor/grep-like Oct 19, 2022
nh13 added a commit that referenced this pull request Dec 14, 2022
Major refactor of the tool and code to make its command line and behavior very similar to unix grep.

1. All reader, writer, and matching threads use a rayon thread pool.  This means that `--threads` is respected.  Previously reader and writer threads were always allocated outside the match pool, and there were specific arguments for the latter and compressing the output (the latter feature has been removed, plaintext FASTQ is the only output format, just pipe it if you need to).
2. Takes in a pattern as the first positional argument, which is now a regular expression (previously a fixed string).
3. Takes in zero or more file paths after the positional argument.  Uses standard input if no file are given positionally or with `-f` below.
4. Input files are assumed to be plain uncompressed FASTQs unless the `--decompress` option is given, in which case they're assumed to be GZIP compressed.  This includes standard input.  The exception are `.gz/.bgz` and`.fastq/.fq` which are always treated as GZIP compressed and plain text respectively.
5. Implement the following options from grep:

* `-c, --count`: simply return the count of matching records
* `-F, --fixed-strings`: interpret pattern as a set of fixed strings
* `-v,--invert-match`: Selected records are those not matching any of the specified patterns
* `--color <color>`: color the output records with ANSI color codes
* `-e, --regexp <regexp>...`: specify the pattern used during the search.  Can be specified multiple times
* `-f, --file <file>`: Read one or more newline separated patterns from file.
* `-Z, --decompress`: treat all non `.gz`, `.bgz`, `.fastq`, and `.fq` files as GZIP compressed (default treat as uncompressed)

6. The exit code follows GREP, where we exit with 0 if one or more lines were selected, 1 if no lines were selected, and >1 if an error occurred.

7.  Add non-grep options:

* `--paired`: if one file or standard input is used, treat the input as an _interleaved_ paired end FASTQ.  If more than one file is given, ensure that the number of files are a multiple of two, and treat each consecutive pair of files as R1 and R2 respectively.  If the pattern matches either R1 _or_ R2, output both (interleaved FASTQ).
* `--reverse-complement`: searches the reverse complement of the read sequences in addition
* `--progress`: write progress (counts of records searched)
* `-t, --threads <threads>`: see (1) above

8. Miscellaneous changes:

* Add a rust-toolchain file in #15
* Unit tests added by @samfulcrum in #16 and #18.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants