Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor registry system: no direct dependencies; expose standard hash.Hash; be a data carrier. #136

Merged
merged 17 commits into from
Mar 11, 2021

Commits on Mar 6, 2021

  1. Refactor registry system: no direct dependencies; expose standard has…

    …h.Hash; be a data carrier.
    
    ---
    
    Dependencies:
    
    Previously, depending on the go-mulithash package would create direct
    dependencies to several other modules for their various hash function
    implementations.  This meant that instead of go-multihash being a
    lightweight, easy-to-accept dependency itself, it became something
    which would noticably increase the size of your go.mod file,
    your package graph, your download sizes during development, and most
    concerningly, your compile output size in final products.
    
    Now, there is a registry system (see the Register function), and
    the main go-multihash package *only* populates the registry with hashes
    that are available from the golang standard library by default.
    This means you gain no transitive dependencies on other libraries
    by importing the go-multihash package, and your binaries will not
    be bloated by hashers you don't use.  (Your go.mod file may still
    show more repos; but they don't end up in your builds unless you
    actually refer to them).
    
    There are now several new packages in `go-multihash/register/*`.
    These can be imported to register the hashes in those packages.
    If you want all the hashes that were previously available, just make
    sure to import "go-mulithash/register/all" somewhere in your program.
    (You can register hashes, too, without making PRs to this library;
    these packages are just here for convenience and easy use.)
    
    **This is a breaking change** if you used hashes not found in the
    golang stdlib, such as blake2 and sha3.  However, to update,
    all you need to do is ensure the relevant `go-multihash/register/*`
    package is imported anywhere in your program -- an easy change.
    A `go-multihash/register/all` package can be imported to get a hasher
    registered for all of the same multihash codes as before (but will
    correspondingly add the dependency weight back too, of course).
    
    ---
    
    Standard hash.Hash:
    
    Previously, go-multihash had its own definition of a `HashFunc`
    interface, and only exposed hashing through the `multihash.Sum` method.
    The problem with this was these definitions did not support streaming:
    one had to have an entire chunk of memory loaded at once,
    in a single contiguous byte slice, in order to hash it.
    
    (A second, admittedly much more minor, problem with this was that one
    often had to write glue code to turn a `hash.Hash` into a
    `multihash.HashFunc`, and since most of golang uses the standard lib
    `hash.Hash` definition already, this was generally avoidable friction.)
    
    Now, the Register function operates in terms of standard `hash.Hash`,
    and there is a `GetHasher` function which can get you a `hash.Hash`.
    (Okay, to be more precise, these functions take and return a factory
    function for a `hash.Hash`.  You get the idea.)
    
    Since the standard `hash.Hash` interface can operate streamingly,
    now it's easy to use go-multihash in a streaming way.
    
    The `multihash.Sum` method works the same as always.
    
    ---
    
    Be a data carrier:
    
    Previously, go-multihash contained checks that any multihash indicator
    codes being handled were required to have a hash function registered
    for them.  This made it very difficult to use go-multihash in a
    "forward compatible" way (and it also made a lot of practical bumps
    for this dependency-extraction refactor).
    
    Now, go-multihash is willing to carry data, even if it doesn't know
    what kind of hash function would be associated with an indicator code.
    (Methods that you'd expect to parse things do still parse the varints,
    making sure they're sanely formatted.  They just don't inspect and
    whitelist the actual integers anymore.)
    
    I removed the `ValidCode` predicate entirely.  It doesn't seem to serve
    any good purpose anymore.
    
    ---
    
    Other:
    
    I have not touched the `Codes` and `Names` maps in this diff.
    I think we should probably review (and probably remove) these, and
    instead direct people to use the go-multicodec package instead,
    which has the two advantages of decoupling registration of an
    implementation versus simply having a description, and also being
    automatically generated from the multiformats table.
    However, I wanted to check on feelings about this before doing the work
    (especially because they're somewhat entangled with a bunch of the
    tests in this package, making their removal somewhat nontrivial).
    
    Most of the test files are now `package multihash_test`.
    This makes for some colorful diffs, but is not otherwise interesting.
    The reason for this is because the dependency separation process now
    requires the tests to import those `register/*` packages, and to
    avoid a cycle, that means, well, `package mulithash_test`.
    
    I think there's probably more work to be done in making this library
    really shine.  For example, in reviewing the `Encode` function,
    I see some allocations that look very likely to be avoidable... if the
    function was redesigned to be more aware of how it's likely to be used.
    However, I took no action on this, in part because this diff is big
    enough already, and in part because I think it might be reasonable to
    re-examine the relationship of this code to go-cid at the same time.
    
    I dropped `TestSmallerLengthHashID`.  It appeared to be testing an API
    that wasn't actually exported... and the nearest API that *is* exported
    (Sum) has a general contract of truncating a hash upon short length,
    so it was overall unclear what this test should be checking.
    Review might be needed on this.
    
    The situation for murmur3 is still in need of resolution.
    It's commented out entirely for now.  Questions are noted in the diff.
    
    There's a 'register/miniosha256' package which sets the sha2-256
    implementation to a non-stdlib one.  If you don't import this package,
    you still get a sha2-256; it's just the stdlib one.
    I did not include this in the 'register/all' group.
    (Maybe it's faster; maybe it's not; but it's definitely not required,
    and I'm getting some reports it also shows weird on profiles, so I tend
    to think maybe one should really have to explicitly ask for this one.)
    warpfork committed Mar 6, 2021
    Configuration menu
    Copy the full SHA
    9e12c3a View commit details
    Browse the repository at this point in the history
  2. Return to using constant pool from root package.

    It seems likely that we should replace many of these with values
    from https://github.com/multiformats/go-multicodec in the future,
    however, we have a new issue to track that:
    #137
    
    And as noted in
    #136 (comment)
    it's time to reign in the amount of work going on in this PR.
    warpfork committed Mar 6, 2021
    Configuration menu
    Copy the full SHA
    382c37a View commit details
    Browse the repository at this point in the history
  3. Optimize doubleSha256 construction.

    Avoid allocations with more reuse the same slice for results.
    warpfork committed Mar 6, 2021
    Configuration menu
    Copy the full SHA
    5647e22 View commit details
    Browse the repository at this point in the history
  4. Drop murmur3 implementation.

    Agreed upon in discussion at:
    #136 (comment)
    
    The "REVIEW" comments in the outgoing diff identify the concerns that
    led to this choice.
    warpfork committed Mar 6, 2021
    Configuration menu
    Copy the full SHA
    6897150 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    8c1b61b View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    57ca955 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    10afd22 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2021

  1. Configuration menu
    Copy the full SHA
    442f64a View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2021

  1. Resolving review topics around Encode function.

    Leaving the error return in, per
    #136 (comment)
    
    May also be interesting to note: I did check if ValidCode (or, roughly
    the same thing but perhaps renamed to KnownCode) could be reintroduced,
    and the answer is... actually no, not without broaching other issues.
    A body of `_, ok := registry[code]` is enough to open a can of worms.
    See
    #136 (comment)
    warpfork committed Mar 9, 2021
    Configuration menu
    Copy the full SHA
    67c208a View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2021

  1. Replace need for DefaultLength in opts processing.

    This creates a hasher long enough to ask it its properties.
    This is arguably creating garbage; on the other hand, I don't think
    this is a codepath ever likely to be used in a hot loop anywhere.
    We can extract this to something that caches the properties later
    if it proves necessary.
    
    It quietly defaults to zero for unknown codes.  I don't know if this
    makes sense, but it's what the old code would have done, so it's what
    the new code will do, and I'm not looking deeper into it.
    At this point I'm just trying to make surgically minimal alterations
    and get this changeset as a whole wrapped up so things can move on.
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    bc5cc89 View commit details
    Browse the repository at this point in the history
  2. Proactively reject registration of nil functions.

    Surely no one would try to do this, nor then be surprised if it
    creates problems.  On the other hand: if someone *does* do this, and
    the error doesn't appear until arbitrarily far away when the map is
    read, it's a pain to diagnose... and checking it up front is cheap.
    So, here we go: check it up front.
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    7be6719 View commit details
    Browse the repository at this point in the history
  3. Improve situation around ErrSumNotSupported.

    The number that's missing is now reported in the error.
    This is done using the golang error wrap feature.
    
    We're keeping a constant value for ErrSumNotSupported for
    compatibility and change avoidance reasons.
    
    Code that cares about this exact error will still have to change;
    it is now necessary to use `errors.Is` to detect it.
    
    The text in the ErrSumNotSupported value is also updated,
    because the text no longer made sense given an open registry system.
    As a target-of-opportunity fix, it also now follows golang normative
    conventions for error messages (no capitalization, no punctuation).
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    cbd218c View commit details
    Browse the repository at this point in the history
  4. Revive TestSmallerLengthHashID, and add a special case for identity m…

    …ultihash that rejects truncations.
    
    See #136 (comment)
    for discussion.
    
    This change means Sum behaves slightly differently for identity
    multihashes than it does for any other multihash.  I'm not keeping
    score on the number of ways identity multihash is weird anymore,
    just documenting it and keeping tests passing.
    
    The error message is lifted from the old `sumID` function verbatim.
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    393ba0d View commit details
    Browse the repository at this point in the history
  5. travis: update go version.

    We'll need something a little more recent than 1.11.x to be able to
    use the fmt "%w" and errors.Is features.
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    aff2570 View commit details
    Browse the repository at this point in the history
  6. update dependencies, and go mod tidy.

    I swear I did not mean to be doing this on a branch, but:
    https://travis-ci.com/github/multiformats/go-multihash/builds/219591152
    
    - it became necessary to update the go version in CI for %w features
    
    - that apparently adds pointer alignment checking to the compiler...
    
    - which flunks some stuff in x/crypto...
    
    So, okay, we're updating libraries now!
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    4b5e8aa View commit details
    Browse the repository at this point in the history
  7. Reintroduce DefaultLengths; populate during Register.

    While it was possible to remove all use of this from this repo, when
    attempting propagate changes to downstreams consuming it, it became
    apparently that other repos also rely on this symbol.
    
    Whether or not those usages are important and intentional, whether
    they're actually worth maintaining, and whether they'd be replacable
    with other approaches... is not considered at this time.
    (Probably we should be asking this!  The first occasions where this
    cropped up are in other functions that have been marked "deprecated"
    since... 2018!  But... chasing those things down and straightening
    them out is becoming problematic.  Perhaps we'll be more ready to
    revisit these things at a later date.)
    warpfork committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    1a96911 View commit details
    Browse the repository at this point in the history

Commits on Mar 11, 2021

  1. Add core package (min deps) and give root package full transitive dep…

    …s again.
    
    The transitive dependencies in the root package are still managed by
    this whole registry system (which now resides in the 'core' package),
    so we have parity and *mostly* just one suite of code to maintain.
    
    You can now use this 'core' package when interested in dependency
    minimization.  In exchange, the "register/*" imports may be required.
    
    You can just also just yank updates to go-multihash, and if you were
    already using it (and happen to be using one of these
    now-optional(-but-only-if-you-use-core))... nothing should actually
    change; the "register/*" imports won't be required because the
    root package does them for you.
    
    Some constants are replicated.  This was the minimum step necessary
    to avoid import cycles.  I'm not spending time prettifying it because
    we really probably ought to be refactoring this to use the package
    of constants in go-multicodec that's automatically generated, and
    yet, that is a scope limit we're trying not to cross during this
    changeset.
    warpfork committed Mar 11, 2021
    Configuration menu
    Copy the full SHA
    cebc9f8 View commit details
    Browse the repository at this point in the history