Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Concurrent page extraction #361

Open
gunnsth opened this issue May 23, 2020 · 0 comments
Open

[FEATURE] Concurrent page extraction #361

gunnsth opened this issue May 23, 2020 · 0 comments

Comments

@gunnsth
Copy link
Contributor

gunnsth commented May 23, 2020

Is your feature request related to a problem? Please describe.
Currently extraction only supports processing pages one by one. It might be more efficient to use multiple go-routines to handle page-by-page.

Describe the solution you'd like
Explore what the easiest way to support concurrency in extractor package is.

Describe alternatives you've considered
Alternative and currently the best way for concurrency is on a document basis. I.e. one go-routine handling a single document.

Additional context
Client's comment

We often deal with documents that are 900+ pages and serially processing these with Unidoc was. Taking a long time and this a lot of money in AWS expenses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant