Obtain nodes from VectorStoreIndex #8930
-
I'm loading an index from disk like this: index = load_index_from_storage(storage_context, service_context=service_context) I want to obtain the nodes so that I can create a retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=all_nodes_dict,
verbose=True,
) Is there a way to do this? |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 8 replies
-
🤖 To obtain the nodes from the loaded index in order to create a Here is how you can do it: # Assume you have an instance of TreeIndex
tree_index = TreeIndex(...)
# Get the nodes
ref_doc_info = tree_index.ref_doc_info
# Now you have a dictionary where the keys are node_ids and the values are RefDocInfo objects
# You can create a node_dict for the RecursiveRetriever constructor like this:
node_dict = {node_id: ref_doc_info.node for node_id, ref_doc_info in ref_doc_info.items()} In the code above, Please note that you need to replace You can find more details about this in the llama_index/indices/tree/base.py file. Regarding the Here is the function definition: def load_index_from_storage(
storage_context: StorageContext,
index_id: Optional[str] = None,
**kwargs: Any,
) -> BaseIndex:
"""Load index from storage context.
Args:
storage_context (StorageContext): storage context containing
docstore, index store and vector store.
index_id (Optional[str]): ID of the index to load.
Defaults to None, which assumes there's only a single index
in the index store and load it.
**kwargs: Additional keyword args to pass to the index constructors.
"""
index_ids: Optional[Sequence[str]]
if index_id is None:
index_ids = None
else:
index_ids = [index_id]
indices = load_indices_from_storage(storage_context, index_ids=index_ids, **kwargs)
if len(indices) == 0:
raise ValueError(
"No index in storage context, check if you specified the right persist_dir."
)
elif len(indices) > 1:
raise ValueError(
f"Expected to load a single index, but got {len(indices)} instead. "
"Please specify index_id."
)
return indices[0] You can find more details about this in the llama_index/indices/loading.py file. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
As answered on discord: nodes = index.docstore.docs.values() |
Beta Was this translation helpful? Give feedback.
-
This should be added to the docs! |
Beta Was this translation helpful? Give feedback.
-
I'm joining @badrinathvm's question |
Beta Was this translation helpful? Give feedback.
-
Did anyone try this? To access the nodes if stored in Chroma DB in advance it will return empty dict. |
Beta Was this translation helpful? Give feedback.
-
You could try the below solution to get the nodes (upto a specific count) without asking any question. You could add
|
Beta Was this translation helpful? Give feedback.
-
For ChromaDB, this works for me as much as it is ugly: from llama_index.vector_stores.chroma import ChromaVectorStore
vector_store = ChromaVectorStore(...)
vector_index = VectorStoreIndex.from_vector_store(vector_store)
# set similarity_top_k to a large number to retrieve all the nodes
retriever = vector_index.as_retriever(similarity_top_k=1000000)
# retrieve all nodes
all_nodes = retriever.retrieve("Whatever")
all_nodes = [item.node for item in all_nodes] It would be nice to have something cleaner though |
Beta Was this translation helpful? Give feedback.
As answered on discord: