Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Please help me test the new multiarch docker images #322

Closed
jonaswinkler opened this issue Jan 11, 2021 · 16 comments
Closed

Please help me test the new multiarch docker images #322

jonaswinkler opened this issue Jan 11, 2021 · 16 comments
Labels
help needed This is stuff that I cannot do myself.

Comments

@jonaswinkler
Copy link
Owner

jonaswinkler commented Jan 11, 2021

I've got the CI pipeline (see #151) pretty much ready, and it has successfully built docker images for amd64, armhf and aarch64.

Image is available at Docker Hub. For anyone interested, the workflow that produced these images is here: https://github.com/jonaswinkler/paperless-ng/actions/runs/476333808.

I don't have aarch64 hardware and would love to hear from people who do if this works. Feedback on the arm/v7 image is also welcome.

These images are based on the latest dev branch, which is identical to the current release + a couple bug fixes. But as with all pre-release things, I wouldn't advise to run that with your actual database.

These images can be used with any of the docker-compose files in the docker/hub/ folder. Just replace the version, and pull.

Things I'd like to see tested:

  • Consume digital PDF documents with embedded text

  • Consume scanned PDF documents without embedded text

  • Consume JPG documents

  • Add some "Auto" matching metadata to documents and inspect whether the "Train the classifier" scheduled task executes successfully. You can schedule that to run immediately by going into the admin, editing that scheduled task, and clicking Today / Now further down, then save. To make sure that it's working, you should evenutally see something like this in the logs with the filter set to DEBUG:
    image

  • Try to find some documents with the full text search.

Thank you!

@jonaswinkler jonaswinkler added the help needed This is stuff that I cannot do myself. label Jan 11, 2021
@jonaswinkler jonaswinkler pinned this issue Jan 11, 2021
@jonaswinkler
Copy link
Owner Author

One thing I've already spotted with OCRmyPDF

WARNING 2021-01-11 12:49:04,913 tesseract [tesseract] took too long to OCR - skipping

@mvdkleijn
Copy link

mvdkleijn commented Jan 11, 2021

Nice! :-)

  • raspberry pi 4
  • ubuntu 20.04 for arm / raspberry pi
  • docker 19.03.14 CE
  • docker compose 1.25.0
  • documents stored on ssd

It starts, no obvious problems. Logging in works fine.

My workflow is using the paperless android app mostly. On occasion I get a pdf by email that I add, but not often.

An initial attempt to scan a document using the app results in a Python PIL related error on paperless-ng.

cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)

image

@jonaswinkler
Copy link
Owner Author

Thank you. Is that the 32bit or 64bit variant of ubuntu?

@jonaswinkler
Copy link
Owner Author

See python-pillow/Pillow#5202

@mvdkleijn
Copy link

Thank you. Is that the 32bit or 64bit variant of ubuntu?

64-bit

@jonaswinkler
Copy link
Owner Author

jonaswinkler commented Jan 11, 2021

@mvdkleijn new build is up on the hub, does that resolve the issue?

@niarbx
Copy link

niarbx commented Jan 11, 2021

Hi Jonas,

I tested the new ARM64 Image on Raspberry Pi 4 4GB on latest Raspbian (now called Rasppery Pi OS) 64 Bit.

  • Consumed serveral scanned PDFs
  • Consumed digitally created PDFs
  • Created tags, correspondents and document types
  • Consumed JPGs with and without text

I encountered the following:

  • while consuming a PDF wich consists of a big image the following message was logged:
    • ERROR Error while consuming document PDFWithImage.pdf: cannot import name '_imagingcms' from 'PIL' (/usr/local/lib/python3.7/site-packages/PIL/__init__.py)
  • An Image with Text threw the following Warning: WARNING Error while getting DPI from image
    • This doesnt seem to be a problem, image was OCRed correctly
  • Classifier also works, training and auto-matching worked (while training didnt have accurate results because of too small training data).

By the way, I'm using an arm64 image for about a week now in "production". I extended the Dockerfile by another stage to download the sources (so I dont have to checkout the sources every time a new release is ready) and used python:3.9-slim as base image. Works without any errors so far.

Best regards,
Tobi

@mvdkleijn
Copy link

@jonaswinkler The latest image consumes the document just fine. I do get the samr DPI warning that @niarbx got but other than that it looks fine.

@jonaswinkler
Copy link
Owner Author

jonaswinkler commented Jan 11, 2021

  • An Image with Text threw the following Warning: WARNING Error while getting DPI from image

That shouldn't be a warning, I'll lower the severity. Some images have DPI information in their metadata, and paperless uses that. That's important for PDF generation (how big should the pages be?). If none is available, paperless will produce A4-sized PDF documents.

  • while consuming a PDF wich consists of a big image the following message was logged

Should be fixed in the image from a couple minutes ago

  • Classifier also works, training and auto-matching worked (while training didnt have accurate results because of too small training data).

Thank you, good to know.

@mvdkleijn
Copy link

  • Consumes PDFs with and without embedded text just fine
  • Consumes JPGs with text just fine (did not test text-less JPGs)
  • Classifier, training and auto-matching works fine. Training accuracy was fairly high since I had about 30 similar documents for it to train on.
  • Tags are fine
  • Correspondents are fine
  • Document types are fine
  • User creation is fine (though permission management is somewhat non-trivial)

@mannp
Copy link

mannp commented Jan 11, 2021

Testing the docker build with Unraid and consumed a couple of files with no problem.

I threw a selection of scanned pdf's at it (15) and I've lost the gui, not reachable .... no obvious errors in the log, in fact it appears to be still consuming.

Only just found your NG version of paperless today so will take a better look at my config tomorrow to see if it needs tuning.

Cool NG version btw :)

@jonaswinkler
Copy link
Owner Author

jonaswinkler commented Jan 11, 2021

What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.

Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.

See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices

@mannp
Copy link

mannp commented Jan 11, 2021

What platform? It might take up all resources while consuming (this takes a long time on Pi), and the web server might not get enough cpu time to provide a response in time.

Consider the option TASK_WORKERS and THREADS_PER_WORKER (https://paperless-ng.readthedocs.io/en/latest/configuration.html#software-tweaks). Pi3/4 have a quad core, therefore settings WORKERS=2, THREADS=1 will always leave some resources available for other tasks.

See also https://paperless-ng.readthedocs.io/en/latest/setup.html#considerations-for-less-powerful-devices

Thanks for the info, Unraid machine is a Xeon with 32g of memory running multiple dockers ..

@jonaswinkler
Copy link
Owner Author

Uhm, yeah. That should not have any issues running this.

@sisao
Copy link
Contributor

sisao commented Jan 12, 2021

It's running on armv7 (Banana Pi M2U)

OS: Armbian (Ubuntu 20.04.1 LTS)
Kernel: Linux dms 5.9.14-sunxi #20.11.3 SMP Fri Dec 11 20:31:12 CET 2020 armv7l armv7l armv7l GNU/Linux
Docker: 19.03.12
docker-compose: 1.27.4

No errors so far.
Consuming Email with attachment works, scanned pdf consuming works, full text search works, training of classifier starts and works.

@jonaswinkler
Copy link
Owner Author

Alright, thank you very much. Multi arch images are coming soon.

@jonaswinkler jonaswinkler unpinned this issue Jan 12, 2021
tribut pushed a commit to tribut/paperless-ng that referenced this issue Aug 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help needed This is stuff that I cannot do myself.
Projects
None yet
Development

No branches or pull requests

5 participants