Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in html to docx conversion #3280

Closed
mdolginin opened this issue Dec 5, 2016 · 16 comments
Closed

Bug in html to docx conversion #3280

mdolginin opened this issue Dec 5, 2016 · 16 comments

Comments

@mdolginin
Copy link

After update to pandoc 1.19 I can't open result docx files in MS Word (version 15.28) because of error (see the screenshot, some text in russian).

2016-12-05 15 37 58

If I use pandoc 1.18 there is no such problem.

@jgm
Copy link
Owner

jgm commented Dec 5, 2016 via email

@mdolginin
Copy link
Author

Yep. I'm using a custom reference file

@jgm
Copy link
Owner

jgm commented Dec 5, 2016 via email

@mdolginin
Copy link
Author

There is same error when I try default reference.docx in pandoc 1.19. When I remove 1.19 and install 1.18 error is gone.

@mdolginin
Copy link
Author

But if I try to open result docx in LibreOffice it works fine

@jgm
Copy link
Owner

jgm commented Dec 6, 2016 via email

@mdolginin
Copy link
Author

mdolginin commented Dec 6, 2016 via email

@jgm
Copy link
Owner

jgm commented Dec 6, 2016 via email

@mdolginin
Copy link
Author

here is my default reference (after execute pandoc --print-default-data-file=reference.docx > myfile.docx)

myfile.docx

@jgm
Copy link
Owner

jgm commented Dec 6, 2016

Odd, I had no trouble creating a docx using pandoc 1.19 with your original ref.docx. If you want to repeat what I did, it was

pandoc --reference-docx ref.docx -o my.docx MANUAL.txt

with the pandoc MANUAL.txt. Can you try that and see if it works? To get a copy of the manual you can use pandoc --print-default-data-file MANUAL.txt > MANUAL.txt.

I've got Word for Mac 2011, by the way. Perhaps this is sensitive to Word version?

@jgm
Copy link
Owner

jgm commented Dec 6, 2016

Also, can you do the conversion you were trying before with the default reference.docx you just generated?

@mdolginin
Copy link
Author

with the pandoc MANUAL.txt. Can you try that and see if it works? To get a copy of the manual you can use pandoc --print-default-data-file MANUAL.txt > MANUAL.txt.

It works fine with MANUAL.txt.

But if I try to do this with html file with table (in attachment) I get an error. This HTML doc was generated from asciidoctor file. (In pandoc 1.18 this works fine)

dostup_test.html.zip

P.S. thank you for dealing with this issue. I appreciate it
Pandoc is very usefull tool 👍

@jgm
Copy link
Owner

jgm commented Dec 6, 2016

OK. It's really not about the reference.docx; you can get the failure with just

pandoc dostup_test.html -o dostup_test.docx

If one does pandoc -t native dostup_test.html it's clear what (one) problem is. Pandoc's HTML reader is producing a bunch of spurious empty table rows.
I conjecture this has something to do with commit 5222572; that's the only thing that affected table parsing in the HTML reader between 1.18 and 1.19.

@jgm
Copy link
Owner

jgm commented Dec 6, 2016

Oh, I see the problem. I have addEmpties where I should have map addEmpties.

@jgm jgm closed this as completed in 97274c9 Dec 6, 2016
@jgm
Copy link
Owner

jgm commented Dec 6, 2016

Several mistakes in that code now fixed!
Thanks for reporting this.

@mdolginin
Copy link
Author

Thanks man

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants