-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading files from S3 #41
Comments
Yep, only local files. |
OK, thanks. Any plan to make it work with remote files? |
That seems a bit like scope creep on a singularly focused module. But I can consider adding such a thing. It probably makes more sense as something wrapped around textract instead of embedded within. The files would still need to be written locally. |
Alright, thanks. Looking forward to your implementation. Apache Tika is very complicated to install, and yours is the only good text extractor that's written in node as far as I know. |
Doing some refactoring and think I'll include this. Will be a few days. |
Awesome. I've implemented textract in an order form for translation agencies (clients upload documents and the app returns the number of words & pricing) and it works very well. All the files are stored in an AWS instance for now and textract is in the same instance as well. I will refactor the app to work with S3 when it's ready. If it helps, here's how I plan to use textract:
|
textract was born out of the contracting work I did that involved uploading resumes, extracting the text from them, loading solr with the resume text for searching, and tossing the resume itself into S3. So that all sounds familiar. =) Working through a set of enhancements over the next few days. This'll be one of them. |
Hi David!
Trying to create an endpoint in an Express server like this:
app.get('/textract', function(req, res, next) { textract("https://s3.amazonaws.com/testbucket1a2b3c/test.pdf", function(error, text) { console.log(error); res.end(); }); });
Console returns
[Error: File at path [[ https://s3.amazonaws.com/testbucket1a2b3c/test.pdf ]] does not exist.]
What does this mean exactly? Textract only works with local files? (in this case my file is uploaded to S3). Thanks!
The text was updated successfully, but these errors were encountered: