cannot extract Chinese comma symbol #99

lcc19941214 · 2016-10-14T15:28:28Z

I'm using textract and it's awesome! I can easily extract content from any of my .doc or .docx files.

However, most of my time I'm handling with documents full of Chinese characters and it seems like textract has some porblem with extracting Chinese comma symbol '，' (with space instead).

dbashford · 2016-10-14T16:53:30Z

👍

I tend to update textract every 3 months or so and I am overdue. Hoping to be taking a good look at all the various issues/PRs this time next week.

dbashford · 2016-12-20T14:28:08Z

@lcc19941214 Not an issue anymore?

The last time I spent time working textract stuff I was working this. Didn't check anything in but believe I was close to resolving this.

… adjusted spacing in tests

dbashford · 2016-12-23T16:48:56Z

FWIW, the fix for this was just published as textract 2.1, thanks!

lcc19941214 closed this as completed Dec 20, 2016

dbashford added a commit that referenced this issue Dec 23, 2016

Fixes #99, handling chinese commas, with new test/file for test. Also…

1ceb563

… adjusted spacing in tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot extract Chinese comma symbol #99

cannot extract Chinese comma symbol #99

lcc19941214 commented Oct 14, 2016

dbashford commented Oct 14, 2016

dbashford commented Dec 20, 2016

dbashford commented Dec 23, 2016

cannot extract Chinese comma symbol #99

cannot extract Chinese comma symbol #99

Comments

lcc19941214 commented Oct 14, 2016

dbashford commented Oct 14, 2016

dbashford commented Dec 20, 2016

dbashford commented Dec 23, 2016