-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML document work #61
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main metanorma/reverse_adoc#61 +/- ##
==========================================
+ Coverage 96.67% 98.46% +1.78%
==========================================
Files 42 46 +4
Lines 1054 1306 +252
==========================================
+ Hits 1019 1286 +267
+ Misses 35 20 -15 ☔ View full report in Codecov by Sentry. |
I use AsciiDoctor to round-trip a document. This is one of the first issues I found that turned out to be an issue with AsciiDoctor actually (unless I am mistaken and this is not possible in AsciiDoc): Anyway, the document round trips successfully at this point, though there are still a lot of issues remaining. |
That's fine. We will need to ensure we test Coradoc against AsciiDoctor behavior. Coradoc is meant to be a replacement to AsciiDoctor:
|
A normal AsciiDoctor table cell is plain text only. To allow the image in a table cell you need to specify as an "AsciiDoc table cell". [cols="1,1"]
|===
|cell1
a|image::images/004.webp["",200,100]
|=== |
I just realized this was a bogus issue report, and it's an issue on our side actually. |
Let's gather up any questions within Coradoc first and the team will answer any questions so we don't affect others' repositories. |
6c4a059 makes it so that tables are now computed correctly (mostly, still in testing). This makes the following fragment: Being roundtripped into: What's apparent is a difference between the column widths (I add to a table an attribute cols="3*", for instance), which makes the resulting HTML syntax having predefined column widths. The original document just relies on a web browser to deduce column widths. I have found no way to disable this behavior. Another difference is a lack of BGCOLOR. Should I pass this attribute along? Perhaps when some setting is enabled? |
After this commit, the document is mostly readable in my opinion. There are still some crucial issues that I can see, but the document is now, let's say, testable. Note: I still haven't implemented Below is an archive that contains an adoc file created using this branch and also a html file that is a result of AsciiDoctor processing of that file: |
Thanks @hmdne , this is respectable progress! The only thing is that the document is to be tested using Metanorma, not AsciiDoctor. The sample document for that is in the mn-samples-plateau repository (001-v3 is the v3 of this document, the new HTML version is 001-v4) This HTML document was developed to adhere to Metanorma styling. |
By this, we mean - if before a link there's a space, or beginning of a block, we don't need to add another space. In fact, we shouldn't, because in a case of code like... <div><a href="test">test</a></div> If we add a space before a link, we open a code block and thus we just get a source code and not a link.
In particular, I was curious what caused a performance problem on a large document I'm working on. Turned out, it was a remove_inner_whitespace procedure in Cleaner. With a simple fix I managed to make it finish in 1 second, instead of 170s. All the rest of the processing combined takes 10s, so we will be able to progress much faster on next issues.
Happened to me once, but could happen at any time in production.
The idea here, is that HTML content generators may often introduce a lot of unnecessary markup, that only makes sense in the HTML+CSS context. The idea is that certain cases can be simplified, making it so that the result is equivalent, but much simpler, allowing us to generate a nicer AsciiDoc syntax for those cases.
@ronaldtse Thanks for clarification. I will take a deeper look at how they compare. For now, I need to work a little bit more on tables, so that we will produce necessarily correct AsciiDoc output. |
@ronaldtse A question - this document is not necessarily a semantic HTML, it sometimes uses styling. For instance: Instead of Creating a proper document won't be possible with that in mind. We can't add exceptions like this to reverse_adoc logic, since this is internal to just this document and its styling (or should we? I think the purpose of reverse_adoc is to be agnostic to formats). Otherwise, we will need to add a script to preprocess it and perhaps even postprocess it if Metanorma-compatible content is desired. Can you perhaps provide us some hints on that? (As in, is it a scope of this task, in which repo should such pre/postprocessors land, etc.) |
Let's move the logic of delimiting tables to Coradoc, as I think it makes more sense to be there. This changes semantics a little - now one-line rows are generated if there are any AsciiDoc cells. Before that, it was a logic of Cell to decide if it wanted to be generated multiline or not. This results in nicer tables.
@ronaldtse Handling lists was very tricky, but it's ready now. I have also uncovered something like a definition list in 7.2.4, but since their use of markup ( What I can see as remaining tasks to be done in this PR:
|
To make things easier, I'm uploading the current version of the document generated: I plan to continue development tomorrow (Sunday) on 4-6 AM GMT+2. |
We have generated a section tree at this point, so we may split sections into individual files. I am not entirely sure this approach will correctly translate into all documents, not only the one we are working on. |
Thanks to a suggestion from @xyz65535 I have handled indentation in the document with In addition, I finalized a plugin implementation. It is now possible to plug-in at any meaningful state of AsciiDoc generation. I suppose this could be used to add something like a Metanorma plug-in, that would for instance try to extract and produce data that is meaningful to Metanorma, but not necessarily in the AsciiDoc standard. The plugin architecture should support multiple plugins to be used for any conversion. |
@hmdne the ideal AsciiDoc encoding: ==== 変換規則
===== スキーマ変換規則
* スキーマ変換規則は、1-UR3.0及びCityGML2.0に従う。
* なお、標準製品仕様書は、応用スキーマクラス図及びこれに対応するXMLSchemaを新規に作成するのではなく、1-UR3.0及びCityGML2.0から必要な部分のみを選択し、使用している。
* 応用スキーマクラス図に示す、クラス名、属性名及び関連役割名は、1-UR3.0及びCityGML2.0において定義されたタグに一致させている。
* また、複数の名前空間から選択しているため、全てのクラス名に、エ-UR3.0又はCityGML2.0名前空間の接頭辞を付ける。
===== インスタンス変換規則
GMLに準拠する。
* オブジェクト識別子(gml:id)
+
--
データ製品に含まれる全ての地物には、gml:idによる識別可能な値を与えることとし、その値には[接頭辞]_[UUID]を使用する。
[接頭辞]は、CityGML及びューURの各パッケージに与えられた接頭辞(表7-4)を使用する。
[UUID]は、Universally Unique Identifier(UUID)[2]とする。UUIDとは、ソフトウェア上でオブジェクトを一意に識別するための識別子であり、128ビット(16バイト)の値で表す。先頭から4ビットごとに16進数の値(0~f)に変換し、8桁-4桁-4桁-4桁-12桁に切って表現する。
--
* 集成の実装
+
--
応用スキーマに示された地物間の集成は、部品となるオブジェクトを、全体となるオブジェクトの子要素として記述する。
この時、部品となるオブジェクトの識別子(gm1:id)を、全体となるオブジェクト以外のオブジェクトが参照してもよい。
--
* 空間参照系の識別
+
--
幾何オブジェクトに適用される空間参照系は、都市モデル(core:CityModel)に挿入されるEnvelop要素の属性snsNameにおいて、以下のEPSGコードを挿入することにより識別する。
[cols="9,4"]
|===
| 空間参照系の名称 | srsNameに挿入する値
| 日本測地系2011における経緯度座標系と東京湾平均海面を基準とする標高の複合座標参照系
| http://www.opengis.net/def/crs/EPSG/0/6697
|===
--
* schemaLocationの指定
+
i-URの符号化様は、30都市モデル内のschemasフォルダ(7.2.4)に格納したXMLSchemaファイルへの相対パスによりschemaLocationを指定する。 The interesting thing about the PLATEAU documents is they use the clause scheme like this: So the Level 4 and Level 5 are actually not lists, they are clauses (sections). |
The last clause level is not something we can extract programmatically, as the only class we have available is "text2data" - all we can deduce from that is that the author intended a "level 2 indentation". This class is used a lot in the document, for instance the underlined parts are also "text2data": While this example in particular we handle specially as per your request, it's compiled into a numbered list, in other part of the document, those are "text2data": I see no way from this how to interpret "text2data" in any other way, programmatically, as "level 2 indentation" and that's what I try to accomplish with lists. |
@hmdne there are always a balance between automated processing and manual processing, and I do agree that there are some portions we have to manually fix up after automated processing. As long as we know what work remains (ping @metanorma/editors ) that's fine. |
I have completed the last task on this issue. This will still need some testing, but other than that, I don't see any more remaining problems with conversion. Below is the (hopefully) final version of document, ready for review: |
@ronaldtse There was a minor fix uncovered by the test suite, but it doesn't affect the document. I think this PR is ready. |
@hmdne can you let me know how you've tested the feature? This is what I used. $ bundle exec reverse_adoc -rcoradoc/reverse_adoc/plugins/plateau --split-sections 2 --external-images -o plateau/index.adoc index.html I have additional issues that I will file separately now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @hmdne!
This branch is aiming to be able to convert a HTML file from metanorma/reverse_adoc#90.
Metanorma PR checklist