Refactor transform to be isomorphic (Node/web compatibility) #171

eyelidlessness · 2023-02-21T22:22:58Z

This replaces #164, with many of the same goals and benefits, but taking a different approach. Rather than running the transform in a headless browser, this change introduces a partial DOM compatibility layer on top of libxmljs when building for Node. Most of the changes in transformer.ts are quite similar to #164, probably a bit safer as none of the XPath expressions have been converted to CSS query selectors. I also caught and fixed a couple of minor implementation mistakes.

As in #164, a demo app is included to demonstrate the web transform.

Performance

As discussed in #165, this change of approach seems more reasonable, because that PR brought such great performance improvements that the complexity of running a headless browser on the server no longer makes much sense.

This change may incur a small performance hit on Node, simply because the DOM compatibility layer introduces a bit of runtime indirection. However, the impact will depend on the specific XForm being transformed, and may even perform better for certain forms, i.e. in cases where parts of the XSLT now reimplemented as DOM/libxmljs might perform better than the equivalent libxslt behavior they replace.

Performance will also vary by environment. The current benchmark suite performs about the same in Chrome and Node, slightly faster in WebKit, and faster still in Firefox. The improved performance in Firefox also results in the VA WHO form being slightly faster than the SOAR form, which may provide good hints for where to look new optimization targets.

My hope is that the benchmark summaries will be reported in comments below once the current CI runs complete. If that's not the case I may add them manually.

Development and maintenance improvements

This change will have many of the same dev/maintenance benefits as #164, while adding less complexity to the runtime environment. And while this change does add some complexity to the build, it adds a great deal less complexity than #164 would have.

How this change has been tested and validated

As with #164 and #165, my goal making this change is to retain 100% feature/bug compatibility. The same snapshot tests added in #164 (merged in #165) continue to provide strong assurance to that end.

As with #164, for the portions of code converted from XSLT to DOM, the implementations are written to mirror the original XSLT as closely as possible, from naming parity to logical flow. As noted above, this change goes even further in that regard by preserving existing XPath expressions rather than converting them to DOM equivalents. We may want to go further in that direction in the future, but I believe for this change it's best to stick as closely as possible to existing semantics.

Unlike #164, this change runs all tests and benchmarks in CI in the following environments:

Node 14
Node 16
Firefox
Chromium
WebKit

The project has again been installed locally in enketo-core and enketo-express, to verify that all tests pass, and that flows known to depend on Transformer work as expected.

And again, the browser demo also serves as a validation that the new build setup works in a web project.

Known issues

I neglected to mention this in #164, but it's worth calling out now. The web build is currently fairly large: a little over 1.6 MB (just over 200 KB gzipped). The vast majority of that size comes from the language-tags dependency, which must be bundled to be used in a browser.

One low hanging fruit improvement for this might be to import the dependency dynamically, only for forms which actually need it (and I'd be happy to make that change in this PR if preferred). In the future we should consider:

Can we optimize our use of language-tags to reduce the overall size?
Can language-tags itself be optimized to improve downstream bundle size?
Is there a more suitable alternative to language-tags (whether another package, or possibly equivalent native browser functionality)?

Includes tests around API stability, as these expectations were all broken in a previous naive attempt at the same change

1. Introduces an abstract subset of DOM APIs corresponding to previous usage of semantically equivalent `libxmljs` APIs. It would have been nicer to import these types and export the subset we care about, but this is inconsistent with how TypeScript `lib` works (it always augments the global scope). 2. Provides a Node implementation of those DOM APIs by extending `libxmljs` prototypes. This kind of extension isn't ideal, but it's the most reasonable way to achieve such a compatibility layer without sacrificing performance. An ealirer alternative approach used `WeakMap`s to cross reference the DOM/`libxmljs` interfaces, but this had a significant impact on perf. This implementation is type checked by default. 3. Provides a web implementation by... just exporting the relevant globals. This implementation is type checked by `tsc` using the `tsconfig.web.json` project file. This ensures that the abstract DOM interfaces are actully consistent with the built in DOM `lib` types. 4. Refactors transformer.ts to use the abstract DOM APIs. This uses the Node DOM compatibility implementation by default, and the native web DOM implementation when the environment variable `ENV` is set to "web". 5. Also when `ENV` is web, tests and benchmarks call `transform` in the specified `BROWSER` environment variable (defaulting to "fiefox") via a simple `playwright` bridge. In CI, all tests and benchmarks are run in: Node 14, Node 16, Firefox, Chromium, Webkit. 6. Build is updated to produce both Node and web targets. The build config itself is fairly complex, but it's been consolidated in `vite.config.ts`. This also includes a corresponding change to build `app.ts` rather than the previous, much more complex `app.js` using a Vite dev server (and roughly restores its implementation to what it had been prior to the initial TypeScript migration). 7. There are several known issues at the point of this commit. These either correspond to XSLT extensions not supported by browser targets, or to differences in behavior between DOM/`libxmljs`, all of which will be addressed in separate commits discussing each in greater detail.

This addresses a slight difference between Node's and browsers' respective implementations of `URL`, namely that browsers do not implicitly escape special path characters for unknown schemes (i.e. `jr`). This change is a workaround, temporarily substituting the `jr` scheme to consistently take advantage of built-in escaping behavior, then restoring it for the final return value.

This addresses: - an inconsistency between how `libxslt` and the native DOM `XSLTProcessor` handle XSLT transforms to HTML.`libxslt` (correctly, IMO) treats the XSLT document's expressed root element as the resulting document element, whereas browsers produce a document with an implicit `html > body > root` structure. - a slight change in `renderMarkdown` which parses to a document rather than a fragment, for the purpose of a simpler DOM compatibiity API in this case. (That call to `correctHTMLDocHierarchy` was mistakenly included in an earlier commit, but I'm hoping this subsequent commit will be clear enough and we'll likely squash these commits anyhow.)

The functionality depending on them will be reimplemented with DOM APIs

It's possible there are other attribute cases I haven't identified, but these are the ones I was able to find in test failures, snapshot mismatches, and the spec.

These dynamic template calls are now injected into the XSL, allowing us to eliminate the use of `str:replace` and `dyn:evaluate` extensions.

This eliminates the need to use `str:tokenize` extension

Faster Firefox startup, improved logging

Also bumps actions versions which have been warning about Node 12 for no apparent reason

- Update README - Clarify change to URL escaping implementation per initial review feedback

…lates into XSLT

- Add demo dev mode (very useful for manual testing during dev) - Actually display errors - Don't show Invalid state on errors

lognaturel · 2023-03-07T19:16:16Z

README.md

+
+-   The `openclinica` flag. This functionality is used by OpenClinica's fork of Enketo Express.
+
+-   The deprecated `preprocess` option. This functionality _may_ be used to update XForms with deprecated content, but its use is discouraged as users can achieve the same thing by preprocessing their XForms before calling `transform`.


"with deprecated content" -> "with preprocessed content"?

custom content? I don't think "deprecated" is right but I'm not entirely sure of the intent!

This relates to test that use preprocess and show an example of continuing to support deprecated form features. Remove the "what it's used for" part and just recommend doing any XForms preprocessing externally.

lognaturel · 2023-03-07T20:27:34Z

src/transformer.ts

+declare const document: DOM.Document | undefined;
+
+/**
+ * This addresses a difference in how Firefox treats XML namespace declarations


Enketo Core relies on namespaces being being copied onto the first child of the main instance.

lognaturel

Stating the obvious, there's a lot here.

@eyelidlessness has really carefully considered the process and I trust that the snapshot tests provide a lot of safety. I think the biggest risk is that the indirection adds complexity but I think it's separated well enough that it can be followed, especially with the brief narrative in the readme.

I've marked a couple of areas that I want to dig deeper into but I expect that this will be merged today.

src/transformer.ts

lognaturel · 2023-03-14T17:31:02Z

src/xsl/openrosa2html5form.xsl

-                    </xsl:if>
-                </xsl:if>
-            </xsl:for-each>
+        <xsl:variable name="rows" select="./@rows" />


I haven't fully traced the rows logic. I'm counting on the test at

enketo-transformer/test/transformer.spec.ts

Line 665 in a231a5f

it('and outputs a textarea for rows="x" attribute on text input, with a rows appearance', async () => {

to be enough to support the refactor.

In general, this appearance logic feels like an area of risk because I assume there aren't a ton of test forms with appearances (I could be wrong). I read through it enough to convince myself it was likely correct but if we agree it's risky I can spend more time with it.

src/transformer.ts

lognaturel

I have gone through commit by commit looking for areas of risk. I've asked questions and looked for tests related to the biggest areas of risk. I'm satisfied that this is largely a safe change, especially given its scope!

eyelidlessness added 15 commits February 18, 2023 09:50

Convert transform to async/await

e02efa9

Includes tests around API stability, as these expectations were all broken in a previous naive attempt at the same change

Fix: namespaces on instance root nodes

cda915a

Remove fatal errors from XSL from unsupported extensions

b46e7b5

The functionality depending on them will be reimplemented with DOM APIs

Fix: Firefox does not apply templates to attributes

29a733f

It's possible there are other attribute cases I haven't identified, but these are the ones I was able to find in test failures, snapshot mismatches, and the spec.

Fix: dynamic itemset itext translations

4227fe4

These dynamic template calls are now injected into the XSL, allowing us to eliminate the use of `str:replace` and `dyn:evaluate` extensions.

Fix: transform appearance classes in transformer.ts

3d93e1b

This eliminates the need to use `str:tokenize` extension

Add web demo

538bcfe

Web test/benchmark setup refinements

d9ae798

Faster Firefox startup, improved logging

Update README with web details

97c1513

Fix build/test in CI, improve types somewhat

9a4a1a1

Fix: bundle for web, preserve modules for Node

5c6f898

Move redundant config stuff to shared module again

fdb904c

eyelidlessness force-pushed the refactor/dom-compat branch 4 times, most recently from a720aee to 739faa8 Compare February 22, 2023 20:46

Attempt to report summaries to PR

77eb952

Also bumps actions versions which have been warning about Node 12 for no apparent reason

eyelidlessness force-pushed the refactor/dom-compat branch from 739faa8 to 77eb952 Compare February 22, 2023 21:15

GitHub does not allow forks to write PR comments at all

51595b4

eyelidlessness mentioned this pull request Feb 27, 2023

fixed: The app UI strings in newly created repeats are not translated… enketo/enketo-express#527

Merged

eyelidlessness added 4 commits March 1, 2023 09:28

Fix build paths (hopefully for good this time!)

69109d0

Don't show "Transforming..." in demo when not transforming

50d2f76

Address initial review feedback

13b1e9c

- Update README - Clarify change to URL escaping implementation per initial review feedback

Fix: isolation of global DOM types

885eac7

eyelidlessness force-pushed the refactor/dom-compat branch from 7a1e447 to 885eac7 Compare March 2, 2023 22:22

eyelidlessness added 2 commits March 2, 2023 15:28

Process itemset itext labels in XForm doc, rather than injecting temp…

e480bfe

…lates into XSLT

Use single node XPath where appropriate

671d5c6

Demo tweaks

6efd980

- Add demo dev mode (very useful for manual testing during dev) - Actually display errors - Don't show Invalid state on errors

lognaturel reviewed Mar 7, 2023

View reviewed changes

lognaturel reviewed Mar 14, 2023

View reviewed changes

lognaturel approved these changes Mar 15, 2023

View reviewed changes

lognaturel merged commit ef48622 into enketo:master Mar 15, 2023

eyelidlessness mentioned this pull request Mar 22, 2023

Use client-side transform for preview-by-URL enketo/enketo-express#540

Merged

7 tasks

lognaturel mentioned this pull request Apr 27, 2023

see if browser-XSLT-friendly tweaks can be made enketo/enketo#831

Closed

eyelidlessness mentioned this pull request Jun 28, 2023

Move libxslt to (optional) peerDependencies, devDependencies #181

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor transform to be isomorphic (Node/web compatibility) #171

Refactor transform to be isomorphic (Node/web compatibility) #171

eyelidlessness commented Feb 21, 2023 •

edited

Loading

lognaturel Mar 7, 2023

lognaturel Mar 7, 2023

lognaturel Mar 7, 2023 •

edited

Loading

lognaturel Mar 7, 2023

lognaturel left a comment

lognaturel Mar 14, 2023

lognaturel left a comment


		- The `openclinica` flag. This functionality is used by OpenClinica's fork of Enketo Express.

		- The deprecated `preprocess` option. This functionality _may_ be used to update XForms with deprecated content, but its use is discouraged as users can achieve the same thing by preprocessing their XForms before calling `transform`.

Refactor transform to be isomorphic (Node/web compatibility) #171

Refactor transform to be isomorphic (Node/web compatibility) #171

Conversation

eyelidlessness commented Feb 21, 2023 • edited Loading

Performance

Development and maintenance improvements

How this change has been tested and validated

Known issues

lognaturel Mar 7, 2023

Choose a reason for hiding this comment

lognaturel Mar 7, 2023

Choose a reason for hiding this comment

lognaturel Mar 7, 2023 • edited Loading

Choose a reason for hiding this comment

lognaturel Mar 7, 2023

Choose a reason for hiding this comment

lognaturel left a comment

Choose a reason for hiding this comment

lognaturel Mar 14, 2023

Choose a reason for hiding this comment

lognaturel left a comment

Choose a reason for hiding this comment

eyelidlessness commented Feb 21, 2023 •

edited

Loading

lognaturel Mar 7, 2023 •

edited

Loading