Skip to content

Commit

Permalink
Merge pull request #4312 from inception-project/feature/4308-Support-…
Browse files Browse the repository at this point in the history
…for-custom-XML-formats

#4308 - Support for custom XML formats
  • Loading branch information
reckart authored Nov 18, 2023
2 parents 9ac91dd + 516e4e6 commit 4b4d92d
Show file tree
Hide file tree
Showing 6 changed files with 162 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,8 @@ include::{include-dir}formats-webannotsv3.adoc[leveloffset=+2]

include::{include-dir}formats-xml.adoc[leveloffset=+2]

include::{include-dir}formats-xml-custom.adoc.adoc[leveloffset=+2]

<<<

[appendix]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ for any type of editor plugin:
]
}
----

== Document-rendering editors

A document-rendering editor loads the document and annotation data from the backend and then renders
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,12 @@
import org.springframework.beans.factory.support.BeanDefinitionBuilder;
import org.springframework.beans.factory.support.BeanDefinitionRegistry;
import org.springframework.beans.factory.support.BeanDefinitionRegistryPostProcessor;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.context.annotation.Configuration;

import de.tudarmstadt.ukp.clarin.webanno.support.SettingsUtil;

@ConditionalOnProperty(prefix = "format.custom-xml", name = "enabled", havingValue = "true", matchIfMissing = false)
@Configuration
public class CustomXmlFormatLoader
implements BeanDefinitionRegistryPostProcessor
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
// Licensed to the Technische Universität Darmstadt under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The Technische Universität Darmstadt
// licenses this file to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License.
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

[[sect_formats_xml_custom]]
= XML (custom)

====
CAUTION: Experimental feature. To use this functionality, you need to enable it first by adding `format.custom-xml.enabled` to the `settings.properties` file.
====

Custom XML document support allows defining own XML annotation formats that can be displayed as formatted documents in HTML-based editors (e.g. the Apache Annotator editor or the RecogitoJS editor).

The custom XML document support has the goal to provide means of suitably formatting and rendering XML documents in the browser. It does **not** aim at being able to extract potential annotations from the XML document and making them accessible and editable as annotations within {application-name}. It only offers support for **importing** custom XML documents, but not for exporting them. To **export** the annotated document, another format such as <<sect_formats_uimaxmi>> has to be used.

Custom XML formats are based on the <<sect_formats_xml>> format support. They are defined by creating a sub-folder `xml-formats` in the application home direcotry. Within that folder, another folder is created for each custom XML format. The name of the folder is used as part of the format identifier. Within this per-format folder, a file called `plugin.json` need to be created with the following content:

.Example `plugin.json` for custom XML format
[source,json]
----
{
"name": "TTML format (external)",
"stylesheets": [
"styles.css"
]
}
----

The `plugin.json` file should define one or more CSS stylesheets that define how elements of the custom XML format should be rendered on screen.

.Example `styles.css` for custom XML format
[source,css]
----
@namespace tt url('http://www.w3.org/ns/ttml');
tt|p {
display: block;
border-color: gray;
border-style: solid;
border-width: 1px;
border-radius: 0.5em;
margin-top: 0.25em;
margin-bottom: 0.25em;
&::before {
border-radius: 0.5em 0em 0em 0.5em;
display: inline-block;
padding-left: 0.5em;
padding-right: 0.5em;
margin-right: 0.5em;
background-color: lightgray;
min-width: 10em;
content: attr(agent) '\a0';
}
}
----

Additionally, a `policy.yaml` file should be present in the format folder. It defines how the elements of the XML should be handled when rendering the documents for display in the browser.


.Example `policy.yaml` for custom XML format
[source,yaml]
----
name: TTML Content Policies
version: 1.0
policies:
- elements: [
"{http://www.w3.org/ns/ttml}tt",
"{http://www.w3.org/ns/ttml}body",
"{http://www.w3.org/ns/ttml}div",
"{http://www.w3.org/ns/ttml}p" ]
action: "PASS"
- attributes: ["{http://www.w3.org/ns/ttml#metadata}agent"]
action: "PASS_NO_NS"
----

An example XML file that could be imported with such a format would look like this:

.Example `dialog.xml` file
[source,json]
----
<tt xmlns="http://www.w3.org/ns/ttml" xmlns:ttm="http://www.w3.org/ns/ttml#metadata" xml:lang="en">
<head>
<metadata>
<ttm:agent xml:id="speaker1">Speaker 1</ttm:agent>
<ttm:agent xml:id="speaker2">Speaker 2</ttm:agent>
</metadata>
</head>
<body>
<div>
<p begin="00:00:01.000" end="00:00:05.000" ttm:agent="speaker1">
Hello, this is the first speaker.
</p>
<p begin="00:00:06.000" end="00:00:10.000" ttm:agent="speaker2">
And this is the second speaker.
</p>
</div>
</body>
</tt>
----

NOTE: When exporting a project that contains documents using a custom XML format and importing
it into another {application-name} instance in which the format has not been declared, the custom
XML documents will not be usable. You will also have to copy the custom format declaration over
to the new instance. If you use custom XML formats, make sure you keep backups of them
along with the projects that use them. Also try to use names for your formats that are unlikely to
clash with others. E.g. `tei` may not be the best name for a custom TEI format support -
`project-theater-2000-tei` may be a better name.¸

[cols="2,1,1,1,3"]
|====
| Format | Read | Write | Custom Layers | Description

| XML (custom) (`custom-xml-format-FOLDERNAME`)
| yes
| no
| no
|
|====
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,28 @@

public enum AttributeAction
{
/**
* Pass attribute as-is.
*/
PASS, //

/**
* Pass attribute but remove the namespace.
* <p>
* The CSS {@code content: attr(XXX)} construct is unable to access attributes that are not in
* the default namespace. Support for adding access to namespaced-attributes appears to have
* been present in early proposals of the
* <a href="https://www.w3.org/1999/06/25/WD-css3-namespace-19990625/#attr-function">CSS3
* namespace enhancements</a> but appear to have been dropped for the final recommendation.
* Also, browsers do not appear (yet) to have implemented support for this on their own.
* <p>
* Thus, if the attribute contains data that needs to be accessed using
* {@code content: attr(XXX)}, then use this.
*/
PASS_NO_NS, //

/**
* Attribute is not passed on - it is dropped.
*/
DROP;
}
Original file line number Diff line number Diff line change
Expand Up @@ -252,6 +252,10 @@ private void sanitizeAttribute(AttributesImpl aSanitizedAttributes, QName aEleme
case PASS:
aSanitizedAttributes.addAttribute(uri, localName, qName, type, value);
break;
case PASS_NO_NS:
aSanitizedAttributes.addAttribute("", attribute.getLocalPart(),
attribute.getLocalPart(), type, value);
break;
case DROP:
if (policies.isDebug()) {
attribute = maskAttribute(aElement, attribute);
Expand Down

0 comments on commit 4b4d92d

Please sign in to comment.