You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When attempting to use the UnstructuredClient to parse a PDF document, a ValueError is thrown due to an incompatibility with uvloop. This occurs when initializing the SplitPdfHook in the UnstructuredClient. The error suggests that nest_asyncio is unable to patch the uvloop.Loop.
The version that I am using.
unstructured==0.15.1
unstructured-client==0.23.9
To Reproduce
fromunstructured_clientimportUnstructuredClientfromlangchain_community.document_loadersimportUnstructuredAPIFileLoaderclient=UnstructuredClient()
loader=UnstructuredAPIFileLoader(
file_path="path/to/your/document.pdf",
api_key="your-api-key",
api_url="your-api-url"
)
# This line triggers the errordocuments=loader.load_and_split()
Expected behavior
The UnstructuredClient should initialize successfully and be able to parse the PDF document without throwing a ValueError related to uvloop.
Environment Info
Please run `python scripts/collect_env.py` and paste the output here.
This will help us understand more about the environment in which the bug occurred.
Note: As I don't have access to run this script, please run it in your environment and paste the output here.
Additional context
Python version: 3.11
Using uvloop: Yes
The error occurs in an asynchronous context, possibly within a FastAPI application
The full error traceback suggests this is happening within a larger application (possibly named "pylon")
This indicates that the SplitPdfHook is trying to apply nest_asyncio, which is incompatible with uvloop.
Traceback
raceback (most recent call last):
File "/app/pylon/core/document/unstructured.py", line 30, in parse_document_with_unstructuredio
).load_and_split()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 64, in load_and_split
docs = self.load()
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 107, in lazy_load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 333, in _get_elements
return get_elements_from_api(
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 261, in get_elements_from_api
return partition_via_api(
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured/partition/api.py", line 69, in partition_via_api
sdk = UnstructuredClient(api_key_auth=api_key, server_url=base_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured_client/sdk.py", line 54, in __init__
self.sdk_configuration = SDKConfiguration(
^^^^^^^^^^^^^^^^^
File "<string>", line 13, in __init__
File "/usr/local/lib/python3.11/site-packages/unstructured_client/sdkconfiguration.py", line 38, in __post_init__
self._hooks = SDKHooks()
^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/sdkhooks.py", line 15, in __init__
init_hooks(self)
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/registration.py", line 28, in init_hooks
split_pdf_hook = SplitPdfHook()
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__
nest_asyncio.apply()
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 18, in apply
loop = loop or asyncio.get_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 40, in _get_event_loop
loop = events.get_event_loop_policy().get_event_loop()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 67, in get_event_loop
_patch_loop(loop)
File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 193, in _patch_loop
raise ValueError('Can\'t patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/pylon/routers/document.py", line 43, in create_documents_process
document_info: DocumentsInfo = save_document(document, request.agent, request.organize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pylon/services/knowledge.py", line 67, in save_document
parsed_document = parse_document(
^^^^^^^^^^^^^^^
File "/app/pylon/core/document/parser.py", line 126, in parse_document
raise e
File "/app/pylon/core/document/parser.py", line 119, in parse_document
documents: list[LCDocument] = Parallel(n_jobs=-1, prefer='processes')(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/joblib/parallel.py", line 1918, in __call__
return output if self.return_generator else list(output)
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/pylon/core/document/unstructured.py", line 38, in parse_document_with_unstructuredio
raise FileParserAPIError(f'Failed to parse document from unstructured-io: {filename}. Error: {e!s}') from e
pylon.exceptions.custom_exceptions.FileParserAPIError: Failed to connect or communicate with the file parser server. details: Failed to parse document from unstructured-io. Error: Can't patch loop of type <class 'uvloop.Loop'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.11/site-packages/starlette/responses.py", line 162, in __call__
await self.background()
File "/usr/local/lib/python3.11/site-packages/starlette/background.py", line 45, in __call__
await task()
File "/usr/local/lib/python3.11/site-packages/starlette/background.py", line 30, in __call__
await run_in_threadpool(self.func, *self.args, **self.kwargs)
File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/pylon/routers/document.py", line 46, in create_documents_process
handle_exception(
File "/app/pylon/exceptions/handlers.py", line 46, in handle_exception
send_callback(callback_url, error_response)
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 336, in wrapped_f
return copy(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 475, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 376, in iter
result = action(retry_state)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 418, in exc_check
raise retry_exc.reraise()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 185, in reraise
raise self.last_attempt.result()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 478, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/app/pylon/utils/callbacks.py", line 11, in send_callback
response.raise_for_status()
File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
Any guidance on resolving this issue or workarounds would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Hi @sigridjineth, we have a fix for this error in 0.25.2. Can you upgrade and confirm that this fixes the issue? Unfortunately the solution right now is to fall back to non splitting mode in a uvloop context, but at least we can prevent the error. Stay tuned for a better fix for splitting large pdfs in a nested event loop context.
Here's the GitHub issue formatted as requested:
Describe the bug
When attempting to use the
UnstructuredClient
to parse a PDF document, aValueError
is thrown due to an incompatibility withuvloop
. This occurs when initializing theSplitPdfHook
in theUnstructuredClient
. The error suggests thatnest_asyncio
is unable to patch theuvloop.Loop
.The version that I am using.
To Reproduce
Expected behavior
The
UnstructuredClient
should initialize successfully and be able to parse the PDF document without throwing aValueError
related touvloop
.Environment Info
Note: As I don't have access to run this script, please run it in your environment and paste the output here.
Additional context
SplitPdfHook
is trying to applynest_asyncio
, which is incompatible withuvloop
.Traceback
Any guidance on resolving this issue or workarounds would be greatly appreciated.
The text was updated successfully, but these errors were encountered: