Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug/Can't patch loop of type <class 'uvloop.Loop'> #154

Open
sigridjineth opened this issue Aug 15, 2024 · 2 comments
Open

bug/Can't patch loop of type <class 'uvloop.Loop'> #154

sigridjineth opened this issue Aug 15, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@sigridjineth
Copy link

Here's the GitHub issue formatted as requested:

Describe the bug
When attempting to use the UnstructuredClient to parse a PDF document, a ValueError is thrown due to an incompatibility with uvloop. This occurs when initializing the SplitPdfHook in the UnstructuredClient. The error suggests that nest_asyncio is unable to patch the uvloop.Loop.

The version that I am using.

unstructured==0.15.1
unstructured-client==0.23.9

To Reproduce

from unstructured_client import UnstructuredClient
from langchain_community.document_loaders import UnstructuredAPIFileLoader

client = UnstructuredClient()

loader = UnstructuredAPIFileLoader(
    file_path="path/to/your/document.pdf",
    api_key="your-api-key",
    api_url="your-api-url"
)

# This line triggers the error
documents = loader.load_and_split()

Expected behavior
The UnstructuredClient should initialize successfully and be able to parse the PDF document without throwing a ValueError related to uvloop.

Environment Info

Please run `python scripts/collect_env.py` and paste the output here.
This will help us understand more about the environment in which the bug occurred.

Note: As I don't have access to run this script, please run it in your environment and paste the output here.

Additional context

  • Python version: 3.11
  • Using uvloop: Yes
  • The error occurs in an asynchronous context, possibly within a FastAPI application
  • The full error traceback suggests this is happening within a larger application (possibly named "pylon")
  • The error specifically mentions:
    File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__
      nest_asyncio.apply()
    This indicates that the SplitPdfHook is trying to apply nest_asyncio, which is incompatible with uvloop.

Traceback

raceback (most recent call last):
  File "/app/pylon/core/document/unstructured.py", line 30, in parse_document_with_unstructuredio
    ).load_and_split()
      ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 64, in load_and_split
    docs = self.load()
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_core/document_loaders/base.py", line 30, in load
    return list(self.lazy_load())
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 107, in lazy_load
    elements = self._get_elements()
               ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 333, in _get_elements
    return get_elements_from_api(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/langchain_community/document_loaders/unstructured.py", line 261, in get_elements_from_api
    return partition_via_api(
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured/partition/api.py", line 69, in partition_via_api
    sdk = UnstructuredClient(api_key_auth=api_key, server_url=base_url)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured_client/sdk.py", line 54, in __init__
    self.sdk_configuration = SDKConfiguration(
                             ^^^^^^^^^^^^^^^^^
  File "<string>", line 13, in __init__
  File "/usr/local/lib/python3.11/site-packages/unstructured_client/sdkconfiguration.py", line 38, in __post_init__
    self._hooks = SDKHooks()
                  ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/sdkhooks.py", line 15, in __init__
    init_hooks(self)
  File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/registration.py", line 28, in init_hooks
    split_pdf_hook = SplitPdfHook()
                     ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__
    nest_asyncio.apply()
  File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 18, in apply
    loop = loop or asyncio.get_event_loop()
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 40, in _get_event_loop
    loop = events.get_event_loop_policy().get_event_loop()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 67, in get_event_loop
    _patch_loop(loop)
  File "/usr/local/lib/python3.11/site-packages/nest_asyncio.py", line 193, in _patch_loop
    raise ValueError('Can\'t patch loop of type %s' % type(loop))
ValueError: Can't patch loop of type <class 'uvloop.Loop'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/pylon/routers/document.py", line 43, in create_documents_process
    document_info: DocumentsInfo = save_document(document, request.agent, request.organize)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/pylon/services/knowledge.py", line 67, in save_document
    parsed_document = parse_document(
                      ^^^^^^^^^^^^^^^
  File "/app/pylon/core/document/parser.py", line 126, in parse_document
    raise e
  File "/app/pylon/core/document/parser.py", line 119, in parse_document
    documents: list[LCDocument] = Parallel(n_jobs=-1, prefer='processes')(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/joblib/parallel.py", line 1918, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
    res = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/app/pylon/core/document/unstructured.py", line 38, in parse_document_with_unstructuredio
    raise FileParserAPIError(f'Failed to parse document from unstructured-io: {filename}. Error: {e!s}') from e
pylon.exceptions.custom_exceptions.FileParserAPIError: Failed to connect or communicate with the file parser server. details: Failed to parse document from unstructured-io. Error: Can't patch loop of type <class 'uvloop.Loop'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/usr/local/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.11/site-packages/starlette/routing.py", line 75, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.11/site-packages/starlette/responses.py", line 162, in __call__
    await self.background()
  File "/usr/local/lib/python3.11/site-packages/starlette/background.py", line 45, in __call__
    await task()
  File "/usr/local/lib/python3.11/site-packages/starlette/background.py", line 30, in __call__
    await run_in_threadpool(self.func, *self.args, **self.kwargs)
  File "/usr/local/lib/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/pylon/routers/document.py", line 46, in create_documents_process
    handle_exception(
  File "/app/pylon/exceptions/handlers.py", line 46, in handle_exception
    send_callback(callback_url, error_response)
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 336, in wrapped_f
    return copy(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 475, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 376, in iter
    result = action(retry_state)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 418, in exc_check
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 185, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.11/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/app/pylon/utils/callbacks.py", line 11, in send_callback
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

Any guidance on resolving this issue or workarounds would be greatly appreciated.

@sigridjineth sigridjineth added the bug Something isn't working label Aug 15, 2024
@awalker4
Copy link
Collaborator

Hi @sigridjineth, we have a fix for this error in 0.25.2. Can you upgrade and confirm that this fixes the issue? Unfortunately the solution right now is to fall back to non splitting mode in a uvloop context, but at least we can prevent the error. Stay tuned for a better fix for splitting large pdfs in a nested event loop context.

@sigridjineth
Copy link
Author

@awalker4 thanks for checking it out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants