Reading files from S3 #41

rodrigogalindez · 2015-07-14T22:18:00Z

Hi David!

Trying to create an endpoint in an Express server like this:

app.get('/textract', function(req, res, next) { textract("https://s3.amazonaws.com/testbucket1a2b3c/test.pdf", function(error, text) { console.log(error); res.end(); }); });

Console returns [Error: File at path [[ https://s3.amazonaws.com/testbucket1a2b3c/test.pdf ]] does not exist.]

What does this mean exactly? Textract only works with local files? (in this case my file is uploaded to S3). Thanks!

The text was updated successfully, but these errors were encountered:

dbashford · 2015-07-14T22:21:28Z

Yep, only local files.

rodrigogalindez · 2015-07-14T22:24:11Z

OK, thanks. Any plan to make it work with remote files?

dbashford · 2015-07-14T22:31:50Z

That seems a bit like scope creep on a singularly focused module. But I can consider adding such a thing. It probably makes more sense as something wrapped around textract instead of embedded within. The files would still need to be written locally.

rodrigogalindez · 2015-07-14T22:35:05Z

Alright, thanks. Looking forward to your implementation. Apache Tika is very complicated to install, and yours is the only good text extractor that's written in node as far as I know.

dbashford · 2015-07-23T11:41:32Z

Doing some refactoring and think I'll include this. Will be a few days.

rodrigogalindez · 2015-07-23T18:31:29Z

Awesome. I've implemented textract in an order form for translation agencies (clients upload documents and the app returns the number of words & pricing) and it works very well. All the files are stored in an AWS instance for now and textract is in the same instance as well. I will refactor the app to work with S3 when it's ready. If it helps, here's how I plan to use textract:

Client uploads a file to S3 (I use ng-file-upload: https://github.com/danialfarid/ng-file-upload)
Endpoint runs textract with path = S3 path and returns estimates

dbashford · 2015-07-23T18:33:26Z

textract was born out of the contracting work I did that involved uploading resumes, extracting the text from them, loading solr with the resume text for searching, and tossing the resume itself into S3. So that all sounds familiar. =)

Working through a set of enhancements over the next few days. This'll be one of them.

rodrigogalindez closed this as completed Jul 14, 2015

dbashford reopened this Jul 23, 2015

dbashford modified the milestone: 1.0.0 Jul 26, 2015

dbashford closed this as completed in 1c7be54 Aug 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading files from S3 #41

Reading files from S3 #41

rodrigogalindez commented Jul 14, 2015

dbashford commented Jul 14, 2015

rodrigogalindez commented Jul 14, 2015

dbashford commented Jul 14, 2015

rodrigogalindez commented Jul 14, 2015

dbashford commented Jul 23, 2015

rodrigogalindez commented Jul 23, 2015

dbashford commented Jul 23, 2015

Reading files from S3 #41

Reading files from S3 #41

Comments

rodrigogalindez commented Jul 14, 2015

dbashford commented Jul 14, 2015

rodrigogalindez commented Jul 14, 2015

dbashford commented Jul 14, 2015

rodrigogalindez commented Jul 14, 2015

dbashford commented Jul 23, 2015

rodrigogalindez commented Jul 23, 2015

dbashford commented Jul 23, 2015