You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on #135, I have realized the idea is solid. This issue is to describe shortly what I plan to do; the milestones will need to change a little though.
The idea in short: for DOCX files support, I plan to implement an ODT parser and converter to Coradoc. This will not get rid of LibreOffice dependency (unless user generates ODT file himself). In my experience, ODT is very close to HTML, yet it preserves a lot more semantic than LibreOffice HTML, so this should be fairly easy to do (at least, compared to DOCX - I would describe the difference as follows: the ODT format was designed for document interchange, the DOCX format was designed to represent internal MS Word structures serialized to XML - and as @opoudjis noted, this isn't even well documented).
generalize plugin system and create an ISO Simple Template plugin (I assume this will be needed)
ensure the implementation works with MS Word-generated ODT files
optional, but would require me to buy MS Word license
rationale:
would allow users to export ODT directly from MS Word
we could perhaps script in the future an option to export ODT using MS Word executable
switch default of DOCX from current Coradoc::Input::Docx to Coradoc::Input::Odt
I think even at this point, we should keep the old implementation, so that users will be able to choose another if the first one breaks (those implementations could be called descriptively DocxViaHtml and DocxViaOdt).
@hmdne I think this is doable, but I don't want to spend too much resources in doing this, given we have other priorities.
create a gem, that will map ODT format
Technically this means we create an ODT gem that can read (and possibly, write) ODT, using lutaml-model and rubyzip. This is reasonable and contained as a task (and allows contained testing).
the DOCX format was designed to represent internal MS Word structures serialized to XML
Nonetheless, the ultimate goal remains that we need to support DOCX format input. At this moment I would consider ODT a "easier of the two evils" -- an intermediary step between Coradoc and DOCX. I really think DOCX is within reach.
The current mechanism of html2doc (MHT) already prohibits people with Windows Word from directly loading files generated by Metanorma. Microsoft has removed MHT functionality from Windows Word, and therefore we must switch to generating DOCX in the future.
While working on #135, I have realized the idea is solid. This issue is to describe shortly what I plan to do; the milestones will need to change a little though.
The idea in short: for DOCX files support, I plan to implement an ODT parser and converter to Coradoc. This will not get rid of LibreOffice dependency (unless user generates ODT file himself). In my experience, ODT is very close to HTML, yet it preserves a lot more semantic than LibreOffice HTML, so this should be fairly easy to do (at least, compared to DOCX - I would describe the difference as follows: the ODT format was designed for document interchange, the DOCX format was designed to represent internal MS Word structures serialized to XML - and as @opoudjis noted, this isn't even well documented).
The plan is as follows:
Any opinions on that plan?
@ronaldtse @ReesePlews @opoudjis @webdev778 @xyz65535
The text was updated successfully, but these errors were encountered: