Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream load. Is it possible? #565

Open
vinnitu opened this issue Jun 12, 2024 · 5 comments
Open

stream load. Is it possible? #565

vinnitu opened this issue Jun 12, 2024 · 5 comments

Comments

@vinnitu
Copy link
Contributor

vinnitu commented Jun 12, 2024

I want to load network resourse to index but it failed

import requests
import io
import pickle
import hnswlib

def get_stream(url):
    response = requests.get(url)
    stream_data = response.content
    return io.BytesIO(stream_data)

model = pickle.load(get_stream('http://example.com/model')) # it works

index = hnswlib.Index(space='cosine', dim=128)
index.load_index(get_stream('http://example.com/index.hnsw')) # doesn't work

got error

TypeError: load_index(): incompatible function arguments. The following argument types are supported:
    1. (self: hnswlib.Index, path_to_index: str, max_elements: int = 0, allow_replace_deleted: bool = False) -> None

Invoked with: <hnswlib.Index(space='cosine', dim=128)>, <_io.BytesIO object at 0x7fd364e557c0>

Is it normal idea?

@vinnitu
Copy link
Contributor Author

vinnitu commented Jun 12, 2024

I am not sure, but can we pass io.BytesIO as std::ifstream?

void loadIndex(const std::string &location, SpaceInterface<dist_t> *s) {

    void loadIndex(const std::ifstream &input, SpaceInterface<dist_t> *s) {
        std::streampos position;

        readBinaryPOD(input, maxelements_);
        readBinaryPOD(input, size_per_element_);
        readBinaryPOD(input, cur_element_count);

        data_size_ = s->get_data_size();
        fstdistfunc_ = s->get_dist_func();
        dist_func_param_ = s->get_dist_func_param();
        size_per_element_ = data_size_ + sizeof(labeltype);
        data_ = (char *) malloc(maxelements_ * size_per_element_);
        if (data_ == nullptr)
            throw std::runtime_error("Not enough memory: loadIndex failed to allocate data");
                                                             
        input.read(data_, maxelements_ * size_per_element_);
    
        input.close();
    }

@vinnitu
Copy link
Contributor Author

vinnitu commented Jun 12, 2024

split function at first phase

    void loadStream(const std::ifstream &input, SpaceInterface<dist_t> *s) {
        readBinaryPOD(input, maxelements_);
        readBinaryPOD(input, size_per_element_);
        readBinaryPOD(input, cur_element_count);

        data_size_ = s->get_data_size();
        fstdistfunc_ = s->get_dist_func();
        dist_func_param_ = s->get_dist_func_param();
        size_per_element_ = data_size_ + sizeof(labeltype);
        data_ = (char *) malloc(maxelements_ * size_per_element_);
        if (data_ == nullptr)
            throw std::runtime_error("Not enough memory: loadIndex failed to allocate data");

        input.read(data_, maxelements_ * size_per_element_);
    }
    
    void loadIndex(const std::string &location, SpaceInterface<dist_t> *s) {
        std::ifstream input(location, std::ios::binary);
        std::streampos position;
        loadStream(input, s);
        input.close();
    }

@vinnitu
Copy link
Contributor Author

vinnitu commented Jun 12, 2024

the same things with it

void loadIndex(const std::string &location, SpaceInterface<dist_t> *s, size_t max_elements_i = 0) {

@vinnitu
Copy link
Contributor Author

vinnitu commented Jun 12, 2024

Unfortunately, we can't just do this because functions are used.

.seekg() and .tellg() (we can simplify loading code and remove it)

and maybe std::ifstream is not compatible with io.ByteIO and we need std::istringstream

What do you think about?

@drons
Copy link
Contributor

drons commented Jul 19, 2024

Take a look at #556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants