Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First round of optimizations for vector functions. #46294

Merged
merged 8 commits into from
Sep 3, 2019

Commits on Aug 30, 2019

  1. Precompute vector length on indexing (#45390)

    This parameter is hidden and doesn't need to be supplied by users.
    It allows to check index version and use different encodings/
    decodings depending on the version.
    mayya-sharipova authored and jtibshirani committed Aug 30, 2019
    Configuration menu
    Copy the full SHA
    8e2cb70 View commit details
    Browse the repository at this point in the history
  2. Switch to ByteBuffer for vector encoding. (#45936)

    This commit updates the vector encoding and decoding logic to use
    `java.nio.ByteBuffer`. Using `ByteBuffer` shows an improvement in
    [microbenchmarks](jtibshirani#3) and I
    think it helps code readability. The performance gain might be due to the fact
    `ByteBuffer` uses hotspot intrinsic candidates like `Unsafe#getIntUnaligned`
    under the hood.
    jtibshirani committed Aug 30, 2019
    Configuration menu
    Copy the full SHA
    da5213f View commit details
    Browse the repository at this point in the history
  3. Combine vector decoding and function computation. (#46103)

    This commit updates the dense vector functions like `cosineSimilarity` to
    decode the document vector and compute the result at the same time. Previously,
    we would fully decode the vector into an array, then calculate the function.
    jtibshirani committed Aug 30, 2019
    Configuration menu
    Copy the full SHA
    e80b114 View commit details
    Browse the repository at this point in the history
  4. Use an array instead of a List for the query vector. (#46155)

    This commit updates all dense vector functions to use `float[]` as opposed to a
    `List<Number>` to track the query vector. The `float[]` query vector is held in
    a new superclass `DenseVectorFunction`.
    
    It also factors out the vector length validation into the superclasses
    `DenseVectorFunction` and `SparseVectorFunction`.
    jtibshirani committed Aug 30, 2019
    Configuration menu
    Copy the full SHA
    f01327c View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2019

  1. Precompute the normalized query vector when taking cosine. (#46190)

    This commit updates normalizes the query vector to unit length when
    constructing `CosineSimilarity`. Since the query is already normalized, we
    don't need to divide by its magnitude when computing the cosine.
    jtibshirani authored Sep 3, 2019
    Configuration menu
    Copy the full SHA
    18583c3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5a22b69 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    b51c126 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    701e630 View commit details
    Browse the repository at this point in the history