Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] add signature cat and signature split commands to combine/split signature files #1044

Merged
merged 18 commits into from
Jun 27, 2020

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jun 24, 2020

This PR adds a cat and a split command to the sig submodule, to concatenate and split collections of signatures. See this discussion for the original motivation.

sourmash cat

% sourmash sig cat podar-ref/*.fa.sig -o out

== This is sourmash version 3.3.2.dev9+g462bc387. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 192 signatures total.
saved 192 signatures

sourmash split

% sourmash sig split --outdir /tmp/foo podar-ref/1.fa.sig

== This is sourmash version 3.3.2.dev9+g462bc387. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

Creating --outdir /tmp/foo
writing sig to /tmp/foo/c55d50b7.k=21.scaled=1000.DNA.dup=0.1.fa.sig
writing sig to /tmp/foo/c11126d0.k=31.scaled=1000.DNA.dup=0.1.fa.sig
writing sig to /tmp/foo/e2d11fb6.k=51.scaled=1000.DNA.dup=0.1.fa.sig
loaded and split 3 signatures total.

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Jun 26, 2020

Codecov Report

Merging #1044 into master will increase coverage by 0.45%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1044      +/-   ##
==========================================
+ Coverage   92.09%   92.55%   +0.45%     
==========================================
  Files          72       74       +2     
  Lines        5454     5585     +131     
==========================================
+ Hits         5023     5169     +146     
+ Misses        431      416      -15     
Impacted Files Coverage Δ
sourmash/sig/__main__.py 92.94% <90.62%> (-0.43%) ⬇️
sourmash/cli/sig/__init__.py 100.00% <100.00%> (ø)
sourmash/cli/sig/cat.py 100.00% <100.00%> (ø)
sourmash/cli/sig/split.py 100.00% <100.00%> (ø)
sourmash/cli/signature/__init__.py 100.00% <100.00%> (ø)
sourmash/sourmash_args.py 95.81% <100.00%> (+0.81%) ⬆️
sourmash/signature.py 92.12% <0.00%> (+1.38%) ⬆️
sourmash/nodegraph.py 93.75% <0.00%> (+16.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a15878...893c0c8. Read the comment docs.

@ctb ctb changed the title add signature cat command to combine multiple signature files add signature cat and signature split commands to combine/split signature files Jun 26, 2020
@ctb
Copy link
Contributor Author

ctb commented Jun 26, 2020

I think this is ready for initial review @bluegenes @taylorreiter @luizirber -- the code is a bit ugly, but the functionality is really nice and already useful!

@ctb ctb changed the title add signature cat and signature split commands to combine/split signature files [MRG] add signature cat and signature split commands to combine/split signature files Jun 27, 2020
@bluegenes
Copy link
Contributor

bluegenes commented Jun 27, 2020

Thinking about use cases for sig cat - most of the time, I'd like to use this when I already have some sigs for a file, and want to add a new ksize, scaled or alpha. Will it be possible to pipe signatures from compute to sig cat? Or, better, could we add a compute --append or compute --add that adds (cats) signatures into an existing signature file?

@ctb
Copy link
Contributor Author

ctb commented Jun 27, 2020

piping is currently in a separate issue, #1049. that's not to say we couldn't address it here but it's a chunk of extra work that I might not get to soon :)

note that compute --append wouldn't be more efficient than

sourmash compute ... -o new_sigs
sourmash cat old_sigs new_sigs -o combined_sigs

although note that the implementation supports specifying the output file as one of the input files --

# overwrite old_sigs with concatenated signatures w00t w00t
sourmash cat old_sigs new_sigs -o old_sigs

which I could add as a test...

@ctb
Copy link
Contributor Author

ctb commented Jun 27, 2020

added in 084879c. so, once we get stdin working, you could do

sourmash compute ... -o - | sourmash sig cat old.sigs - -o old.sigs

which would meet your needs, I think?

@bluegenes
Copy link
Contributor

hadn't seen your issue on this! yep, that would definitely fit. Would def be nice to avoid writing an intermediate file

Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! excited to use :)

@ctb ctb merged commit 0e2d259 into master Jun 27, 2020
@ctb ctb deleted the add_sig_cat branch June 27, 2020 22:47
@ctb
Copy link
Contributor Author

ctb commented Jun 27, 2020

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants