-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docx reader bug: 1 row table might be parsed as 1 header row with empty body #3285
Comments
Perhaps @jkr can tell us more about the heuristics the docx reader uses to determine what is a table header.
Yes. If we used GADTs I suppose we could enforce this on the type level. But probably the table AST will have to change anyway to make room for colspans and such. |
Table headers are determined by whether or not the "w:firstRow" attr in "w:tblLook" is set to true. It can also be set by bitmasks, because why not. See here: http://stackoverflow.com/questions/23134215/ooxml-how-is-a-table-header-heading-encoded |
In the case of the submitted docx file, Only argument against this would be that a header-only table is legal if that's what you really want. But I think it's worth allowing for the fact that docx forces you into doing things that no one would really want. So I'll put in the fix unless you have any objections. |
+++ Jesse Rosenthal [Dec 07 16 23:08 ]:
In the case of the submitted docx file, w:firstRow is true, even though
there's only one row. Fix seems like a one-liner: special-case the
one-row table when we're figuring out the header in Docx.hs.
That sounds like the right solution to me.
|
With commit 8ced8cb in pandoc 1.19.1, the output is now [Table [] [AlignDefault,AlignDefault] [0.0,0.0]
[]
[[[OrderedList (1,Decimal,OneParen)
[[Para [Str "Some",Space,Str "other",Space,Str "thing"]]
,[Para [Str "Something"]]]]
,[Plain [Strong [Emph [Str "testing"]]]]]]] 1 minor problem I have is the header now is an empty list This behavior is not cometic only but may causes problem when using filters, e.g. in panflute. |
We could introduce flexibility and accept empty lists in headers, alignments and widths. The cost would be a larger potential for unforeseen bugs down the road. Let me know if anything changes on the pandoc side; if empty lists stay, then I'll just push and update that accepts it as a valid input |
@jkr we should probably have the docx reader behave like the
other readers in how it creates an empty header.
+++ ickc [Dec 12 16 21:06 ]:
… With commit [1]8ced8cb in pandoc 1.19.1, the output is now
[Table [] [AlignDefault,AlignDefault] [0.0,0.0]
[]
[[[OrderedList (1,Decimal,OneParen)
[[Para [Str "Some",Space,Str "other",Space,Str "thing"]]
,[Para [Str "Something"]]]]
,[Plain [Strong [Emph [Str "testing"]]]]]]]
1 minor problem I have is the header now is an empty list [], not a
list of n empty list [[],[]]. Normally, other readers like the markdown
reader would take the later approach.
This behavior is not cometic only but may causes problem when using
filters, e.g. in panflute.
—
You are receiving this because you commented.
Reply to this email directly, [2]view it on GitHub, or [3]mute the
thread.
References
1. 8ced8cb
2. #3285 (comment)
3. https://github.com/notifications/unsubscribe-auth/AAAL5JKzmtwyZl1bOw8b_OrN4ZeENJNNks5rHifygaJpZM4LGbGl
|
By the way, I notice the widths output by the docx reader is always a list of 0.0s. I didn't test it thoroughly, but I worry that it could break something, e.g. I found that the LaTeX writer would not use minipage if the widths are 0.0s. |
Just pushed a fix for the header row issue. The cell widths have actually never been implemented -- I'll take a look at that later today. |
@jkr, would the docx reader emits table cells containing block elements? If not, the width might not be critical. |
It started in pandoc-discuss but I finally got some solid MWE on this:
With 2-table-1.docx,
Then from this native, write to markdown and read back to native,
I believe the problem is caused by the last row
[]]
in the 1st native. The pandoc docx reader somehow treated the 1 row table as a header row with empty body.By the way, as a general rule, is it safe to assert that the align-list, width-list, header-list, and each of the row-list are all having the same length?
The text was updated successfully, but these errors were encountered: