Test failures #149

DarrenCook · 2018-03-21T11:19:32Z

On Linux Mint 18, I forked the project, did a git clone, then yarn install. I also did these:

sudo apt install tesseract-ocr tesseract-ocr-jpn tesseract-ocr-chi-sim unrtf

Running yarn test (npm test is identical, by the way). I get "5 of 177 tests failed". One is because I haven't installed drawingtotext. Here are the others:

textract for .pdf files will properly handle multiple columns:
AssertionError: expected false to be true
textract for .pdf files can handle files with spaces in the name:
AssertionError: expected false to be true
textract for image files will extract text from GIF files:
AssertionError: expected [Error: Error extracting [[ testphoto.gif ]], exec error: Command failed: tesseract /home/darren/Projects/textract/test/files/testphoto.gif /tmp/textract/testphoto quiet
Tesseract Open Source OCR Engine v3.04.01 with Leptonica
Warning in pixReadMemGif: writing to a temp file, not directly to memory
Error in pixReadStreamGif: Can't use giflib-5.1.2; suggest 5.1.1 or earlier
Error in pixReadStream: gif: no pix returned
Error in pixRead: pix not read
Error in pixReadMemGif: pix not read
Error in pixReadMem: gif: no pix returned
Error during processing.
] to be null
fromUrl tests will markdown files:

actual expected

""# This is an h1 ## This is an h2 This__This text has been bolded and italicizeditalicized__ "

(The last one is hard to read without the colour-coding! Basically it is saying the # are still there and the underlines are still in there.)

The text was updated successfully, but these errors were encountered:

dbashford · 2018-03-21T13:16:55Z

3 will likely be OS related.

I get 1/2/4 locally (along with the drawingtotext error) myself. Hoping to spend time today churning through some tickets. Will get tests going first, hopefully is fairly straight forward.

dbashford · 2018-03-21T13:43:31Z

Pushed update that addresses the other breakages. The Tesseract error isn't one I get locally. Can't use giflib-5.1.2; suggest 5.1.1 or earlier is the error you are getting. Not sure if there is something that can be tracked down to figure out what that might be? Funnily enough googling that gets you an issue in the Python textract library.

Going to close this issue out, but can keep discussing. If you figure something out I'm happy to update the docs for other folks that might run into it.

DarrenCook · 2018-03-21T18:04:43Z

Just updated, and 1/2/4 have now gone, but in their place I get three rtf complaints about:

 TypeError: cb is not a function

lib/extractors/html.js:76 called from rtf.js:34.

All three failing tests are in the same describe(), extract_test.js, lines 122 to 152.

dbashford · 2018-03-21T18:29:24Z

hmm, no test failures locally, but think I may know why, I'll dig in later today

dbashford · 2018-07-27T18:40:19Z

PR above fixed this

dbashford closed this as completed Mar 21, 2018

dbashford reopened this Mar 21, 2018

dbashford added the 2.4 label Jun 5, 2018

willshiao mentioned this issue Jul 26, 2018

Fix "cb is not a function" error on RTFs #166

Merged

dbashford closed this as completed Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failures #149

Test failures #149

DarrenCook commented Mar 21, 2018

dbashford commented Mar 21, 2018

dbashford commented Mar 21, 2018

DarrenCook commented Mar 21, 2018

dbashford commented Mar 21, 2018

dbashford commented Jul 27, 2018

Test failures #149

Test failures #149

Comments

DarrenCook commented Mar 21, 2018

dbashford commented Mar 21, 2018

dbashford commented Mar 21, 2018

DarrenCook commented Mar 21, 2018

dbashford commented Mar 21, 2018

dbashford commented Jul 27, 2018