Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove vector causes error, only for "Normalised" distance types: Index.h:remove:1544: Not found the specified id #152

Closed
cjrh opened this issue Nov 2, 2023 · 7 comments

Comments

@cjrh
Copy link

cjrh commented Nov 2, 2023

Summary

When calling index.remove(<id>), an error occurs, but only with the following distance types:

  • Normalized L2
  • Normalized Angle
  • Normalized Cosine

I first reported this bug on the ngt-rs project, but following comments by @lerouxrgd I decided to make a reproducer here. Note that since the same errors occur on that project, it is impossible that the error is in the Python wrapper.

Reproducer

Here is a small file to reproduce the error. To setup the venv and install deps:

$ python3.10 -m venv venv
$ source venv/bin/activate
(venv) $ pip install -U pip wheel ngt

These are the versions I have installed in my venv:

$ pip list
Package    Version
---------- -------
ngt        2.1.3
numpy      1.26.1
pip        23.3.1
pybind11   2.11.1
setuptools 59.6.0
wheel      0.41.3

To run the file below, simply:

(venv) $ python main.py

Here is the file below. The important part is the function run_test().

import ngtpy
import traceback


def run_test(distance_type: str):
    # Create the index
    ngtpy.create(b"tmp", 3, distance_type=distance_type)
    index = ngtpy.Index(b"tmp", zero_based_numbering=False)

    # Insert vectors
    id1 = index.insert([1.0, 2.0, 3.0])
    id2 = index.insert([40.0, 5.0, 6.0])
    index.build_index(2)
    index.save()

    # Search once
    query = [1.01, 1.02, 1.03]
    results = index.search(query, 1)
    assert results[0][0] == id1

    # Remove a vector - PROBLEM
    index.remove(id1)

    # Search again
    results = index.search(query, 1)
    assert results[0][0] == id2

distance_types = [
    "L1",
    "L2",
    "Normalized L2",
    "Angle",
    "Normalized Angle",
    "Cosine",
    "Normalized Cosine",
    "Hamming",
    "Jaccard",
]

for distance_type in distance_types:
    try:
        print()
        run_test(distance_type)
    except Exception as e:
        print(f"{distance_type=:<20}Error: {e}")
        traceback.print_exc()
    else:
        print(f"{distance_type=:<20}OK")
    finally:
        # Clean up the temporary directory
        import pathlib, shutil
        p = pathlib.Path("./tmp")
        if p.exists():
            shutil.rmtree(p)

This produces the following output:

distance_type=L1                  OK

distance_type=L2                  OK

distance_type=Normalized L2       Error: /NGT/release/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
Traceback (most recent call last):
  File "/home/caleb/tmp/ngtbugpy/main.py", line 43, in <module>
    run_test(distance_type)
  File "/home/caleb/tmp/ngtbugpy/main.py", line 22, in run_test
    index.remove(id1)
RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

distance_type=Angle               OK

distance_type=Normalized Angle    Error: /NGT/release/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
Traceback (most recent call last):
  File "/home/caleb/tmp/ngtbugpy/main.py", line 43, in <module>
    run_test(distance_type)
  File "/home/caleb/tmp/ngtbugpy/main.py", line 22, in run_test
    index.remove(id1)
RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

distance_type=Cosine              OK

distance_type=Normalized Cosine   Error: /NGT/release/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id
Traceback (most recent call last):
  File "/home/caleb/tmp/ngtbugpy/main.py", line 43, in <module>
    run_test(distance_type)
  File "/home/caleb/tmp/ngtbugpy/main.py", line 22, in run_test
    index.remove(id1)
RuntimeError: /NGT/release/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

distance_type=Hamming             OK

distance_type=Jaccard             OK

Comments

  • My personal interest is in Normalized Cosine
  • I am actually using ngt-rs, the rust wrapper. But I figured it will be much easier for you to work with a Python reproducer.
@cjrh
Copy link
Author

cjrh commented Nov 2, 2023

Note the results are the same if zero_based_numbering is set to True.

@cjrh cjrh changed the title Remove vector causes error, only for "Normalised" distance types Remove vector causes error, only for "Normalised" distance types: Index.h:remove:1544: Not found the specified id Nov 2, 2023
@masajiro
Copy link
Member

masajiro commented Nov 8, 2023

Thank you for your feedback. I was able to reproduce the issue you mentioned. However, it will take a little while to fix it, because of its complicated situation.

masajiro added a commit that referenced this issue Nov 13, 2023
@masajiro
Copy link
Member

I have released v2.1.5 to resolve this issue.

@lerouxrgd
Copy link
Contributor

lerouxrgd commented Nov 20, 2023

I have integrated this change to ngt-rs (on master branch), however it looks like there is still an issue with Lorenz and SparseJaccard as I get the following error:

cargo test

test ngt::index::tests::test_dist_sparse_jaccard ... FAILED
test ngt::index::tests::test_dist_lorentz ... FAILED

failures:

---- ngt::index::tests::test_dist_sparse_jaccard stdout ----
Error: Error("Capi : ngt_insert_index_as_float() : Error: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/ObjectRepository.h:allocatePersistentObject:345: ObjectSpace::allocatePersistentObject: Fatal error! The dimensionality is invalid. The specified dimensionality=3. The specified object=2.")

---- ngt::index::tests::test_dist_lorentz stdout ----
thread 'ngt::index::tests::test_dist_lorentz' panicked at src/ngt/index.rs:728:23:
index out of bounds: the len is 0 but the index is 0

@masajiro
Copy link
Member

These two distances are different from other distances. When using distances, the given vector data must follow the specific manners for the two distances. You may want you to remove the two distances from your test.

@lerouxrgd
Copy link
Contributor

Thanks for the precision, I have removed them from the test and released the fix.

@cjrh
Copy link
Author

cjrh commented Nov 22, 2023

Thanks everyone 👏🏼

@cjrh cjrh closed this as completed Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants