Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError: Weights sum to zero, can't be normalized #3

Open
mrwadepro opened this issue Aug 13, 2023 · 2 comments
Open

ZeroDivisionError: Weights sum to zero, can't be normalized #3

mrwadepro opened this issue Aug 13, 2023 · 2 comments

Comments

@mrwadepro
Copy link

First off, thanks for taking the time to post this package. I am getting this error when asking a question after I uploaded the PDF.

Using embedded DuckDB without persistence: data will be transient
Traceback (most recent call last):
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/routes.py", line 401, in run_predict
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1302, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/blocks.py", line 1039, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/gradio/utils.py", line 491, in async_iteration
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 80, in get_response
    chain = app(file)
            ^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 46, in __call__
    self.chain = self.build_chain(file)
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/app.py", line 69, in build_chain
    pdfsearch = Chroma.from_documents(documents, embeddings, collection_name= file_name,)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 347, in from_documents
    return cls.from_texts(
           ^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 315, in from_texts
    chroma_collection.add_texts(texts=texts, metadatas=metadatas, ids=ids)
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/vectorstores/chroma.py", line 121, in add_texts
    embeddings = self._embedding_function.embed_documents(list(texts))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 228, in embed_documents
    return self._get_len_safe_embeddings(texts, engine=self.deployment)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/langchain/embeddings/openai.py", line 189, in _get_len_safe_embeddings
    average = np.average(results[i], axis=0, weights=lens[i])
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/john_appleseed/Documents/Pdf-GPT/venv/lib/python3.11/site-packages/numpy/lib/function_base.py", line 550, in average
    raise ZeroDivisionError(
ZeroDivisionError: Weights sum to zero, can't be normalized

@akmcax
Copy link

akmcax commented Aug 18, 2023

Hello Sunil Kumar ji,

Thanks for this excellent git repo.
While testing your code I am getting below error, what can be the possible reason--

Using embedded DuckDB without persistence: data will be transient
Traceback (most recent call last):
File "/home/rtx/akm/lib/python3.8/site-packages/gradio/routes.py", line 401, in run_predict
output = await app.get_blocks().process_api(
File "/home/rtx/akm/lib/python3.8/site-packages/gradio/blocks.py", line 1302, in process_api
result = await self.call_function(
File "/home/rtx/akm/lib/python3.8/site-packages/gradio/blocks.py", line 1039, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/rtx/akm/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/rtx/akm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/rtx/akm/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/rtx/akm/lib/python3.8/site-packages/gradio/utils.py", line 491, in async_iteration
return next(iterator)
File "/tmp/ipykernel_29949/1995911808.py", line 85, in get_response
chain = app(file)
File "/tmp/ipykernel_29949/1995911808.py", line 44, in call
self.chain = self.build_chain(file)
File "/tmp/ipykernel_29949/1995911808.py", line 74, in build_chain
pdfsearch = Chroma.from_documents(documents, embeddings, collection_name= file_name,)
File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 613, in from_documents
return cls.from_texts(
File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 568, in from_texts
chroma_collection = cls(
File "/home/rtx/akm/lib/python3.8/site-packages/langchain/vectorstores/chroma.py", line 126, in init
self._collection = self._client.get_or_create_collection(
File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 79, in get_or_create_collection
return self.create_collection(name, metadata, embedding_function, get_or_create=True)
File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 66, in create_collection
check_index_name(name)
File "/home/rtx/akm/lib/python3.8/site-packages/chromadb/api/local.py", line 41, in check_index_name
raise ValueError(msg)
ValueError: Expected collection name that (1) contains 3-63 characters, (2) starts and ends with an alphanumeric character, (3) otherwise contains only alphanumeric characters, underscores or hyphens (-), (4) contains no two consecutive periods (..) and (5) is not a valid IPv4 address

Kindly note that OpenAI API key has been considered while running the code. Also, the number of characters in the file name is only 10.

@sunilkumardash9
Copy link
Owner

sunilkumardash9 commented Sep 25, 2023

hi @akmcax, it probably has to do with the name of the Chroma collection. Check if it complies with the naming convention. Your collection name might have an underscore or hyphen at the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants