Please help me test the new multiarch docker images #322

jonaswinkler · 2021-01-11T11:55:57Z

I've got the CI pipeline (see #151) pretty much ready, and it has successfully built docker images for amd64, armhf and aarch64.

Image is available at Docker Hub. For anyone interested, the workflow that produced these images is here: https://github.com/jonaswinkler/paperless-ng/actions/runs/476333808.

I don't have aarch64 hardware and would love to hear from people who do if this works. Feedback on the arm/v7 image is also welcome.

These images are based on the latest dev branch, which is identical to the current release + a couple bug fixes. But as with all pre-release things, I wouldn't advise to run that with your actual database.

These images can be used with any of the docker-compose files in the docker/hub/ folder. Just replace the version, and pull.

Things I'd like to see tested:

Consume digital PDF documents with embedded text
Consume scanned PDF documents without embedded text
Consume JPG documents
Add some "Auto" matching metadata to documents and inspect whether the "Train the classifier" scheduled task executes successfully. You can schedule that to run immediately by going into the admin, editing that scheduled task, and clicking Today / Now further down, then save. To make sure that it's working, you should evenutally see something like this in the logs with the filter set to DEBUG:
Try to find some documents with the full text search.

Thank you!

The text was updated successfully, but these errors were encountered:

jonaswinkler · 2021-01-11T12:52:21Z

One thing I've already spotted with OCRmyPDF

WARNING 2021-01-11 12:49:04,913 tesseract [tesseract] took too long to OCR - skipping

mvdkleijn · 2021-01-11T13:10:29Z

Nice! :-)

raspberry pi 4
ubuntu 20.04 for arm / raspberry pi
docker 19.03.14 CE
docker compose 1.25.0
documents stored on ssd

It starts, no obvious problems. Logging in works fine.

My workflow is using the paperless android app mostly. On occasion I get a pdf by email that I add, but not often.

An initial attempt to scan a document using the app results in a Python PIL related error on paperless-ng.

cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)

jonaswinkler · 2021-01-11T13:15:08Z

Thank you. Is that the 32bit or 64bit variant of ubuntu?

jonaswinkler · 2021-01-11T14:06:29Z

See python-pillow/Pillow#5202

mvdkleijn · 2021-01-11T17:41:44Z

Thank you. Is that the 32bit or 64bit variant of ubuntu?

64-bit

jonaswinkler · 2021-01-11T18:26:26Z

@mvdkleijn new build is up on the hub, does that resolve the issue?

niarbx · 2021-01-11T18:36:51Z

Hi Jonas,

I tested the new ARM64 Image on Raspberry Pi 4 4GB on latest Raspbian (now called Rasppery Pi OS) 64 Bit.

Consumed serveral scanned PDFs
Consumed digitally created PDFs
Created tags, correspondents and document types
Consumed JPGs with and without text

I encountered the following:

while consuming a PDF wich consists of a big image the following message was logged:
- ERROR Error while consuming document PDFWithImage.pdf: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)
An Image with Text threw the following Warning: WARNING Error while getting DPI from image
- This doesnt seem to be a problem, image was OCRed correctly
Classifier also works, training and auto-matching worked (while training didnt have accurate results because of too small training data).

By the way, I'm using an arm64 image for about a week now in "production". I extended the Dockerfile by another stage to download the sources (so I dont have to checkout the sources every time a new release is ready) and used python:3.9-slim as base image. Works without any errors so far.

Best regards,
Tobi

mvdkleijn · 2021-01-11T18:42:07Z

@jonaswinkler The latest image consumes the document just fine. I do get the samr DPI warning that @niarbx got but other than that it looks fine.

jonaswinkler · 2021-01-11T18:49:54Z

An Image with Text threw the following Warning: WARNING Error while getting DPI from image

That shouldn't be a warning, I'll lower the severity. Some images have DPI information in their metadata, and paperless uses that. That's important for PDF generation (how big should the pages be?). If none is available, paperless will produce A4-sized PDF documents.

while consuming a PDF wich consists of a big image the following message was logged

Should be fixed in the image from a couple minutes ago

Classifier also works, training and auto-matching worked (while training didnt have accurate results because of too small training data).

Thank you, good to know.

mvdkleijn · 2021-01-11T20:43:51Z

Consumes PDFs with and without embedded text just fine
Consumes JPGs with text just fine (did not test text-less JPGs)
Classifier, training and auto-matching works fine. Training accuracy was fairly high since I had about 30 similar documents for it to train on.
Tags are fine
Correspondents are fine
Document types are fine
User creation is fine (though permission management is somewhat non-trivial)

mannp · 2021-01-11T21:28:42Z

Testing the docker build with Unraid and consumed a couple of files with no problem.

I threw a selection of scanned pdf's at it (15) and I've lost the gui, not reachable .... no obvious errors in the log, in fact it appears to be still consuming.

Only just found your NG version of paperless today so will take a better look at my config tomorrow to see if it needs tuning.

Cool NG version btw :)

jonaswinkler · 2021-01-11T21:49:12Z

What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.

Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.

See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices

mannp · 2021-01-11T22:34:14Z

What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.

Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.

See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices

Thanks for the info, Unraid machine is a Xeon with 32g of memory running multiple dockers ..

jonaswinkler · 2021-01-11T22:51:18Z

Uhm, yeah. That should not have any issues running this.

sisao · 2021-01-12T08:50:49Z

It's running on armv7 (Banana Pi M2U)

OS: Armbian (Ubuntu 20.04.1 LTS)
Kernel: Linux dms 5.9.14-sunxi #20.11.3 SMP Fri Dec 11 20:31:12 CET 2020 armv7l armv7l armv7l GNU/Linux
Docker: 19.03.12
docker-compose: 1.27.4

No errors so far.
Consuming Email with attachment works, scanned pdf consuming works, full text search works, training of classifier starts and works.

jonaswinkler · 2021-01-12T14:36:53Z

Alright, thank you very much. Multi arch images are coming soon.

Fix minor sphinx errors

jonaswinkler added the help needed This is stuff that I cannot do myself. label Jan 11, 2021

jonaswinkler pinned this issue Jan 11, 2021

jonaswinkler mentioned this issue Jan 11, 2021

Cannot login using 0.0.10 bauerj/paperless_app#33

Closed

jonaswinkler closed this as completed Jan 12, 2021

jonaswinkler unpinned this issue Jan 12, 2021

tribut pushed a commit to tribut/paperless-ng that referenced this issue Aug 18, 2022

Merge pull request jonaswinkler#322 from paperless-ngx/fix-sphinx-errors

9d97899

Fix minor sphinx errors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please help me test the new multiarch docker images #322

Please help me test the new multiarch docker images #322

jonaswinkler commented Jan 11, 2021 •

edited

Loading

jonaswinkler commented Jan 11, 2021

mvdkleijn commented Jan 11, 2021 •

edited by jonaswinkler

Loading

jonaswinkler commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021

mvdkleijn commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021 •

edited

Loading

niarbx commented Jan 11, 2021 •

edited

Loading

mvdkleijn commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021 •

edited

Loading

mvdkleijn commented Jan 11, 2021

mannp commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021 •

edited

Loading

mannp commented Jan 11, 2021 •

edited

Loading

jonaswinkler commented Jan 11, 2021

sisao commented Jan 12, 2021

jonaswinkler commented Jan 12, 2021

Please help me test the new multiarch docker images #322

Please help me test the new multiarch docker images #322

Comments

jonaswinkler commented Jan 11, 2021 • edited Loading

jonaswinkler commented Jan 11, 2021

mvdkleijn commented Jan 11, 2021 • edited by jonaswinkler Loading

jonaswinkler commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021

mvdkleijn commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021 • edited Loading

niarbx commented Jan 11, 2021 • edited Loading

mvdkleijn commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021 • edited Loading

mvdkleijn commented Jan 11, 2021

mannp commented Jan 11, 2021

jonaswinkler commented Jan 11, 2021 • edited Loading

mannp commented Jan 11, 2021 • edited Loading

jonaswinkler commented Jan 11, 2021

sisao commented Jan 12, 2021

jonaswinkler commented Jan 12, 2021

jonaswinkler commented Jan 11, 2021 •

edited

Loading

mvdkleijn commented Jan 11, 2021 •

edited by jonaswinkler

Loading

jonaswinkler commented Jan 11, 2021 •

edited

Loading

niarbx commented Jan 11, 2021 •

edited

Loading

jonaswinkler commented Jan 11, 2021 •

edited

Loading

jonaswinkler commented Jan 11, 2021 •

edited

Loading

mannp commented Jan 11, 2021 •

edited

Loading