Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft release notes for v4.2.0 #1604

Closed
ctb opened this issue Jun 18, 2021 · 5 comments
Closed

Draft release notes for v4.2.0 #1604

ctb opened this issue Jun 18, 2021 · 5 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jun 18, 2021

git log --oneline v4.1.2..latest

4.2.0 release notes

This release adds several significant features: first, we've added a set of taxonomy command-line functionality for combining sourmash gather output with taxonomy databases, and we've also added a new "picklist" feature that enables flexible selection of subsets of databases. Finally, we've added manifests to databases to support picklists as well as faster database loading and signature selection.

As of this release, we've also formally moved development over to the sourmash-bio organization on GitHub, and we've created a new gitter support channel, sourmash-bio/community. Please join us there if you have any questions, comments, or feature requests!

Major new features:

Documentation updates:

Minor new features:

Bug fixes and performance improvements:

Refactoring and cleanup:

@ctb
Copy link
Contributor Author

ctb commented Jun 24, 2021

This isn't really v4.2 specific, but I wanted to put them somewhere -

database gather bench
gtdb-rs202.genomic-reps.k31.sbt.zip 0:18.49
gtdb-rs202.genomic-reps.k31.lca.json.gz 0:47.55
gtdb-rs202.genomic.k31.lca.json.gz 1:58.08
gtdb-rs202.genomic-reps.k31.zip 3:20.74
gtdb-rs202.genomic.k31.sbt.zip 6:21.36
gtdb-rs202.genomic.k31.zip 19:51.58

@ctb
Copy link
Contributor Author

ctb commented Jun 24, 2021

Searching against the database, but with a picklist containing 4 identifiers:

database search w/4 item picklist
gtdb-rs202.genomic-reps.k31.zip^1^ 0:01.56
gtdb-rs202.genomic.k31.zip^1^ 0:07.51
gtdb-rs202.genomic-reps.k31.sbt.zip^2^ 0:09.88
gtdb-rs202.genomic.k31.sbt.zip^2^ 0:35.42
gtdb-rs202.genomic-reps.k31.lca.json.gz^3^ 0:46.01
gtdb-rs202.genomic.k31.lca.json.gz^3^ 1:51.11

Footnotes:
^1^ Zipfile collections use manifests with picklists, so only the relevant signatures are loaded and searched; hence, there's a big speedup!
^2^ SBTs don't use manifests in search, and the picklists are applied after signatures are found. This increase in speed doesn't make sense to me!?
^3^ LCA database are loaded and parsed before picklists are applied, so there should be little speed improvement from picklists.

So, I still don't understand the SBT speedup, but everything else makes sense.

@ctb
Copy link
Contributor Author

ctb commented Jun 25, 2021

results with sourmash 3.5.0

database search
gtdb-rs202.genomic-reps.k31.sbt.zip 0:17.39
gtdb-rs202.genomic.k31.sbt.zip 0:41.82
gtdb-rs202.genomic-reps.k31.lca.json.gz 0:47.90
gtdb-rs202.genomic.k31.lca.json.gz 2:17.28
gtdb-rs202.genomic.k31.zip (not supported)
gtdb-rs202.genomic-reps.k31.zip (not supported)

OK so gotta look into the results for gtdb-rs202.genomic.k31.sbt.zip...

@ctb
Copy link
Contributor Author

ctb commented Jun 25, 2021

sourmash 3.5.0 full results

== This is sourmash version 3.5.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

select query k=31 automatically.
loaded query: d52ec062... (k=31, DNA)
                                                                              
loaded 1 databases.
        Command being timed: "sourmash gather bench/out.sig gtdb-rs202.genomic.k
31.sbt.zip"
        User time (seconds): 11.78
        System time (seconds): 11.31
        Percent of CPU this job got: 55%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:41.82
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 15639052
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 4566312
        Voluntary context switches: 3660
        Involuntary context switches: 301833
        Swaps: 0
        File system inputs: 31119168
        File system outputs: 129056
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

vs

sourmash latest results

== This is sourmash version 4.1.3.dev4+gb787b756. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

select query k=31 automatically.
loaded query: d52ec062... (k=31, DNA)
                                                                              
loaded 1 databases.
Starting prefetch sweep across databases.
Found 35 signatures via prefetch; now doing gather.
found less than 50.0 kbp in common. => exiting
        Command being timed: "sourmash gather bench/out.sig gtdb-rs202.genomic.k
31.sbt.zip"
        User time (seconds): 52.83
        System time (seconds): 12.60
        Percent of CPU this job got: 17%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 6:21.36
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1783692
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1118712
        Voluntary context switches: 45963
        Involuntary context switches: 326881
        Swaps: 0
        File system inputs: 5618696
        File system outputs: 129056
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

User time is much shorter for sourmash 3.5.0...

Hmm, let me try with --no-prefetch...

@ctb
Copy link
Contributor Author

ctb commented Jul 1, 2021

is released on pypi!

@ctb ctb closed this as completed Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant