-
Notifications
You must be signed in to change notification settings - Fork 56
Gotchas
jlward edited this page Apr 24, 2013
·
1 revision
- Tags that have ever been styled bold/italics/underlined will always have the b/i/u tags. You need to check to see if the val is false or not.
- hyperlinks and ins/del tags are all basically the same. (Except for the href). The tricky thing is that the links/ins/del tags have their own runs of texts, make sure to get them all when constructing the text (not sure if this is a bug or not in pydocx)
- There are two types of images: drawings and picts. After some pre-processing you can treat them the same.
- Images dimensions are measured in EMUS. There are 9525 EMUS per pixel
- Font sizes are hard
- rowspans are not the same in OOXML as they are in html. If you have in html, that equates to
- If two lists items have the same numId, they are part of the same list (a curse and a blessing both)
- headers (h1, h2, etcs) look a lot like lists. You need to check the style and see if it is considered a header (which is case-insensitive)
- Images stored on the document then resized are not resized in the ZIP file.
- Anchor hrefs and image sources can be found in ‘word/_rels/document.xml.rels’ And they are based on an Id mapped to a Target