Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fujitsu driver option support #9

Closed
IBMPortablePc opened this issue Jul 19, 2020 · 9 comments
Closed

Fujitsu driver option support #9

IBMPortablePc opened this issue Jul 19, 2020 · 9 comments
Assignees

Comments

@IBMPortablePc
Copy link

Thank you for a great script. GUI is well and good, however CLI is often the way to simply get things done. A case in point is the lack of a decent Mac frontend for Sane.

I had no trouble using this script on my Mac (OS X 10.14.3) , once I installed the dependencies in Brew. In order to install scanadf I had to compile the Sane frontends, however this went smoothly.

However, the Fujitsu driver ( for my Scansnap S510M) supports brightness and contrast (see https://fossies.org/dox/sane-backends-1.0.27/fujitsu_8c_source.html etc) and I suspect that I can use those option by tweaking your script?

Also, I find the cropping to be a little too random i.e. sometimes it is too aggressive, however I guess that's simply how the driver works.

Lastly, the OCR option works perfectly except that the resulting PDF is extremely large i.e. A0 or A1 in size even if the original is A5. Any thoughts on this would be appreciated although I suspect it's an issue with how one of the dependencies works on Macs.

@IBMPortablePc
Copy link
Author

IBMPortablePc commented Jul 19, 2020

Okay, upon further examination I see that PS2PDF is performing the cropping, despite it being a feature of the Fujitsu driver. Now to work out how to use the driver cropping and not ps2pdf.....

@rocketraman
Copy link
Owner

Thank you for a great script.

I'm happy people are finding it useful.

I had no trouble using this script on my Mac (OS X 10.14.3) , once I installed the dependencies in Brew. In order to install scanadf I had to compile the Sane frontends, however this went smoothly.

Nice to hear.

However, the Fujitsu driver ( for my Scansnap S510M) supports brightness and contrast (see https://fossies.org/dox/sane-backends-1.0.27/fujitsu_8c_source.html etc) and I suspect that I can use those option by tweaking your script?

Yes, that should be quite easy to add. I might also add a way to "pass through" options to the driver.

Lastly, the OCR option works perfectly except that the resulting PDF is extremely large i.e. A0 or A1 in size even if the original is A5. Any thoughts on this would be appreciated although I suspect it's an issue with how one of the dependencies works on Macs.

When using ocr, tesseract does the conversion into PDF. Does this size issue happen consistently? What version of tesseract do you have installed?

Okay, upon further examination I see that PS2PDF is performing the cropping, despite it being a feature of the Fujitsu driver. Now to work out how to use the driver cropping and not ps2pdf.....

The --crop option actually does set the Fujitsu driver --sw-crop=yes option. It then asks ps2pdf to respect the bounding box in the Postscript data. I actually can't remember why I added this, or if this is a bug, but if you think this is the problem try commenting out this line, and see what happens:

https://github.com/rocketraman/sane-scan-pdf/blob/master/scan#L188

Also, I find the cropping to be a little too random i.e. sometimes it is too aggressive, however I guess that's simply how the driver works.

You could try commenting out the deletion of the intermediate outputs as mentioned here #8 (comment) and try and determine where in the pipeline the issue is occurring. If it turns out the aggressive crop is as result of a post-scan stage (like ps2pdf), then post the intermediate outputs somewhere (send them to me privately if you wish) and I can take a look.

@rocketraman
Copy link
Owner

Check out the code in branch issue-9 -- it should allow you to pass through any driver option you like with -xo (short for eXtended option). For example:

scan -xo "--brightness 50 --contrast -10" -o scan.pdf

@rocketraman
Copy link
Owner

rocketraman commented Jul 19, 2020

The --crop option actually does set the Fujitsu driver --sw-crop=yes option. It then asks ps2pdf to respect the bounding box in the Postscript data. I actually can't remember why I added this, or if this is a bug.

Just tried it, and it isn't a bug -- its necessary to get the PDF to respect the size of the driver image output.

@rocketraman
Copy link
Owner

In my local testing, I do notice that tesseract (with the --ocr option) does a slightly poorer job at setting the bounding box correctly, and does crop a bit beyond what the driver has output. If you were using the --ocr option with --crop, can you try it without --ocr?

@IBMPortablePc
Copy link
Author

IBMPortablePc commented Jul 21, 2020

Tesseract creating very large PDF page sizes is a known issue, with a suggested solution:

"Set the dpi of the input images. Use mogrify from ImageMagick or similar."
tesseract-ocr/tesseract#150

I have noticed that my cropping issue is that sometimes scans are cropped to exactly Letter size, even if they are A4 and A4 is specified. IT appears that there is a page size setting somewhere that I am overlooking, although I do not know why I sometimes do end up with A4 pdfs (assuming I don't use OCR/tesseract).

@rocketraman
Copy link
Owner

Tesseract creating very large PDF page sizes is a known issue, with a suggested solution:

"Set the dpi of the input images. Use mogrify from ImageMagick or similar."
tesseract-ocr/tesseract#150

Thanks for the link, I'll take a look.

I have noticed that my cropping issue is that sometimes scans are cropped to exactly Letter size, even if they are A4 and A4 is specified. IT appears that there is a page size setting somewhere that I am overlooking, although I do not know why I sometimes do end up with A4 pdfs (assuming I don't use OCR/tesseract).

Its possible you are running into #8.

@rocketraman
Copy link
Owner

I've merged the extended option support into master, closing. I will re-open a separate issue for the tesseract problem once I look into it.

@rocketraman
Copy link
Owner

@IBMPortablePc The tesseract issue fixed also as per #12 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants