Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error ngt_remove_index() only with normalized distance types #17

Closed
cjrh opened this issue Nov 1, 2023 · 5 comments
Closed

Error ngt_remove_index() only with normalized distance types #17

cjrh opened this issue Nov 1, 2023 · 5 comments

Comments

@cjrh
Copy link
Contributor

cjrh commented Nov 1, 2023

Summary

I've come across a problem when calling index.remove(id: VecId), but it only happens with certain distance types:

  • NormalizedAngle
  • NormalizedCosine
  • NormalizedL2

The common theme seems to be that these are normalized?

This is the error that is produced:

Error: Capi : ngt_remove_index() : Error: /.../ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

Reproducer

I've made a basic project with test cases:

ngtbug.zip

These are the versions in use:

[dependencies]
anyhow = "1.0.75"
ngt = "0.6.1"
uuid = { version = "1.5.0", features = ["v4"] }

and in the lock file, ngt-sys is at 2.1.3.

This is test case, to explain what is happening:

#[cfg(test)]
mod tests {
    use anyhow::Result;
    use ngt::{NgtIndex, NgtDistance, NgtProperties, EPSILON};

    fn run_test(dist: NgtDistance) -> Result<()> {
        // Create a new index
        let prop = NgtProperties::<f32>::dimension(3)?
            .distance_type(dist)?;

        // Temp dir for tests
        let temp_dir_p = std::env::temp_dir();
        let temp_dir = temp_dir_p.to_string_lossy();
        let u = uuid::Uuid::new_v4();
        let u = u.as_simple();
        let index_path = format!("{temp_dir}/{u}");
        std::fs::remove_dir_all(&index_path).ok();

        let mut index = NgtIndex::create(&index_path, prop)?;

        // Insert two vectors and get their id
        let vec1 = vec![1.0, 2.0, 3.0];
        let vec2 = vec![4.0, 5.0, 6.0];
        let id1 = index.insert(vec1)?;
        let id2 = index.insert(vec2)?;
        println!("id1: {}", id1);
        println!("id2: {}", id2);

        // Actually build the index (not yet persisted on disk)
        // This is required in order to be able to search vectors
        index.build(2)?;
        index.persist()?;

        // Perform a vector search (with 1 result)
        let res = index.search(&[1.1, 2.1, 3.1], 1, EPSILON)?;
        assert_eq!(res[0].id, id1);

        // PROBLEM HERE
        index.remove(id1)?;

        // Cleanup - remove the index path
        std::fs::remove_dir_all(&index_path)?;

        Ok(())
    }

    #[test]
    fn test01_L1() -> Result<()> {
        run_test(NgtDistance::L1)
    }

    #[test]
    fn test02_L2() -> Result<()> {
        run_test(NgtDistance::L2)
    }

    #[test]
    fn test03_Angle() -> Result<()> {
        run_test(NgtDistance::Angle)
    }

    #[test]
    fn test04_Hamming() -> Result<()> {
        run_test(NgtDistance::Hamming)
    }

    #[test]
    fn test05_Cosine() -> Result<()> {
        run_test(NgtDistance::Cosine)
    }

    #[test]
    fn test06_NormalizedAngle() -> Result<()> {
        run_test(NgtDistance::NormalizedAngle)
    }

    #[test]
    fn test07_NormalizedCosine() -> Result<()> {
        run_test(NgtDistance::NormalizedCosine)
    }

    #[test]
    fn test08_Jaccard() -> Result<()> {
        run_test(NgtDistance::Jaccard)
    }

    #[test]
    fn test09_SparseJaccard() -> Result<()> {
        run_test(NgtDistance::SparseJaccard)
    }

    #[test]
    fn test10_NormalizedL2() -> Result<()> {
        run_test(NgtDistance::NormalizedL2)
    }

    #[test]
    fn test11_Poincare() -> Result<()> {
        run_test(NgtDistance::Poincare)
    }

    #[test]
    fn test12_Lorentz() -> Result<()> {
        run_test(NgtDistance::Lorentz)
    }
}

This is the output from the test run:

$ cargo test -- --test-threads=1
   Compiling ngtbug v0.1.0 (/home/caleb/tmp/ngtbug)

running 12 tests
test tests::test01_L1 ... ok
test tests::test02_L2 ... ok
test tests::test03_Angle ... ok
test tests::test04_Hamming ... ok
test tests::test05_Cosine ... ok
test tests::test06_NormalizedAngle ... FAILED
test tests::test07_NormalizedCosine ... FAILED
test tests::test08_Jaccard ... ok
test tests::test09_SparseJaccard ... FAILED
test tests::test10_NormalizedL2 ... FAILED
test tests::test11_Poincare ... ok
test tests::test12_Lorentz ... FAILED

failures:

---- tests::test06_NormalizedAngle stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test07_NormalizedCosine stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test09_SparseJaccard stdout ----
Error: Capi : ngt_insert_index_as_float() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/ObjectRepository.h:allocatePersistentObject:345: ObjectSpace::allocatePersistentObject: Fatal error! The dimensionality is invalid. The specified dimensionality=3. The specified object=2.

---- tests::test10_NormalizedL2 stdout ----
id1: 1
id2: 2
Error: Capi : ngt_remove_index() : Error: /home/caleb/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id

---- tests::test12_Lorentz stdout ----
id1: 1
id2: 2
thread 'tests::test12_Lorentz' panicked at src/main.rs:40:23:
index out of bounds: the len is 0 but the index is 0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::test06_NormalizedAngle
    tests::test07_NormalizedCosine
    tests::test09_SparseJaccard
    tests::test10_NormalizedL2
    tests::test12_Lorentz

test result: FAILED. 7 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.01s

The failures in SparseJaccard and Lorentz are interesting but unrelated to my issue. I need to use NormalizedCosine for my application.

Comments

It is possible this is an issue with the upstream NGT library. I am not sure but I decided to ask here first. It is also possible that I have missed some detail about these distance types are supposed to be used.

@cjrh cjrh changed the title Error removing vector: Error: Capi : ngt_remove_index() : Error: /.../ngt-sys-2.1.3/NGT/lib/NGT/Index.h:remove:1544: Not found the specified id Error ngt_remove_index() only with normalized distance types Nov 1, 2023
@lerouxrgd
Copy link
Owner

Thanks for the detailed report, I am indeed able to reproduce the errors you see.
I also think it might be due to some bugs in the underlying NGT code and it would be worth reporting it there too.
I will try to investigate NGT code.

@cjrh
Copy link
Contributor Author

cjrh commented Nov 2, 2023

Ok I made a reproducer using ngt-py and made a corresponding issue there ☝🏼

@wallies
Copy link

wallies commented Nov 13, 2023

@cjrh @lerouxrgd this looks to be fixed in the latest release https://github.com/yahoojapan/NGT/releases/tag/v2.1.5

lerouxrgd added a commit that referenced this issue Nov 19, 2023
@lerouxrgd lerouxrgd reopened this Nov 20, 2023
@lerouxrgd
Copy link
Owner

lerouxrgd commented Nov 20, 2023

GitHub automatically closed this issue when I merged the PR, however there seems to be an issue remaining for Lorenz and SparseJaccard distances. I have mentioned it in the related NGT issue.

@lerouxrgd
Copy link
Owner

As Lorentz and SparseJaccard seem a bit special, I have released 0.7.0 that fixes the issue for the other distances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants