-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xml support? #124
Comments
fails to var fs = require("fs");
var JSZip = require("jszip");
const { parse } = require('node-html-parser');
const docxPath = process.argv[2];
async function main() {
const data = fs.readFileSync(docxPath);
const zip = await JSZip.loadAsync(data);
const xml = await zip.files["word/document.xml"].async("text");
const doc = parse(xml);
//console.dir(doc.querySelectorAll('w:t')); // Error: unmatched pseudo-class :t
console.dir(doc.querySelectorAll('w\\:t')); // == [] (empty result)
} // async function main
main(); alternatives: xml2js, ... |
I believe the exception thrown out is because we cannot select a node which tagname contains |
yes, sorry ... its a parser bug in fb55/css-what#512 |
Not a parser bug, but CSS requires the colon to be escaped here. |
aah, thanks! fixed my sample code, now { rules: [ { type: 'tag', name: 'w:t', namespace: null } ] } and new problem seems to be in sample input docx, generated by libreoffice writer <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape"
xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
mc:Ignorable="w14 wp14"
>
<w:body>
<w:p>
<w:pPr>
<w:pStyle w:val="Normal"/>
<w:bidi w:val="0"/>
<w:jc w:val="left"/>
<w:rPr>
<w:rFonts w:ascii="Liberation Sans" w:hAnsi="Liberation Sans"/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:rFonts w:ascii="Liberation Sans" w:hAnsi="Liberation Sans"/>
<w:sz w:val="20"/>
<w:szCs w:val="20"/>
</w:rPr>
<w:t>hello</w:t>
<w:tab/>
<w:tab/> the textnode starts at
|
It also won't parse |
I had a look into this. In HTML5 spec,
@milahu I actually think you've run into the same issue as this: #156 I believe it matched A temporary workaround is to use the following config:
I'm going to go ahead and close this issue for housekeeping, but you can track the applicable bug here: |
taoqf#124) Tags 'premises' is matched as 'pre', 'pstyle' as 'style', etc.
Worth mention in README.
Tested on Atom feed - working fine.
The text was updated successfully, but these errors were encountered: