Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide DOMParser, XMLSerializer, and XSLTProcessor DOM APIs for working with XML #3648

Closed
MarkTiedemann opened this issue Jan 10, 2020 · 19 comments
Labels
feat new feature (which has been agreed to/accepted)

Comments

@MarkTiedemann
Copy link
Contributor

It would be pretty awesome if I could use Deno for parsing, transforming, and serializing XML without a third-party module, just like I can in the browser:

@kitsonk
Copy link
Contributor

kitsonk commented Jan 13, 2020

I think it would always be best to leave these heavy-weight things is to get jsdom supportable under Deno.

@MarkTiedemann
Copy link
Contributor Author

@kitsonk You are right, DOMParser and XMLSerializer are working in JSDOM (they are implemented using https://github.com/jsdom/w3c-xmlserializer and https://github.com/lddubeau/saxes).

XSLTProcessor isn't working but, even though it's supported in all browsers (except IE), it's also non-standard so I guess it's optional, anyways.

@MarkTiedemann
Copy link
Contributor Author

Just ran into another use case where solid XML parsing would be awesome: Dynamically listing all currently registered media types (by parsing https://www.iana.org/assignments/media-types/media-types.xml).

async function listMediaTypes() {
  let response = await fetch("https://www.iana.org/assignments/media-types/media-types.xml");
  let xml = await response.text();
  let document = new DOMParser().parseFromString(xml, "application/xml");
  let types = [];
  for (let registry of document.querySelectorAll("registry registry")) {
    for (let record of registry.querySelectorAll("record")) {
      let file = record.querySelector("file");
      if (file !== null) {
        types.push(file.textContent);
      } else {
        types.push(`${registry.querySelector("title").textContent}/${record.querySelector("name").textContent}`);
      }
    }
  }
  return types.sort((a, b) => a.toLocaleLowerCase("en").localeCompare(b.toLocaleLowerCase("en")));
}

await listMediaTypes();

This works fine in the browser console, but not in Deno.

@danilaplee
Copy link

There is a great library for parsing html(https://docs.rs/select/0.4.3/select/) in rust,
so I'am up for implementing this issue as a native DOMParser as per the W3C/WEB-API standard https://developer.mozilla.org/en-US/docs/Web/API/DOMParser

@MarkTiedemann
Copy link
Contributor Author

utkarshkukreti/select.rs seems to be unstable.

Note: All the API is currently unstable

I think what you are looking for is the parser that they are using internally: servo/html5ever. AFAIK, that's the parser used in Firefox so it should be fairly save to use. :)

@danilaplee
Copy link

@MarkTiedemann I've used select before, had no major issues, but html5ever seems more stable 👍 I'll get down to implementation

@timmak
Copy link

timmak commented Apr 8, 2020

I don't know if anyone has done anything on this but would this be a plugin or would deno ship with servo/html5ever I would be interest on possible working on this

@SRNV
Copy link

SRNV commented May 6, 2020

someone is working on it?
would be great

@max-pub
Copy link

max-pub commented May 18, 2020

Given the stated goal of being compatible with WebAPIs,
support of DOMParser, XMLSerializer and XSLTProcessor would be very desirable!

Maybe possible via WASM-compilation of respective Firefox/Chrome - module?

@ry ry added the feat new feature (which has been agreed to/accepted) label May 18, 2020
@kitsonk
Copy link
Contributor

kitsonk commented May 19, 2020

To be clear, the goal is use web compatible APIs to provide features where possible, not support every web feature.

Also, there aren't "respective Firefox/Chrome modules" for these type of features and any WASM code has to interface in a asyncronous way to JavaScript which would be totally unsuitable for implementation of these APIs.

As stated above, the best path forward on these would be to look at https://github.com/jsdom/w3c-xmlserializer and https://github.com/lddubeau/saxes to work under Deno without needing the whole of JSDom.

@MarkTiedemann
Copy link
Contributor Author

MarkTiedemann commented May 19, 2020

@kitsonk I have used jsdom/w3c-xmlserializer for working with XML in Deno. That's not a problem (though it does require custom build steps).

I think the first question is: Should DOMParser and XMLSerializer be part of Deno core or should those be userland modules?

The second question is: Should we implement them in JS or Rust?

Currently, I'm using JS userland modules. Ideally, I'd like to see this implemented in Rust in Deno core.

@max-pub
Copy link

max-pub commented May 19, 2020

I'm certainly no WASM expert, but wasm-functions can be called synchronously from JS, as far as I understood.
I (maybe naively) assumed that compiling something like https://searchfox.org/mozilla-central/source/dom/base/DOMParser.cpp might fast-track inclusion of standard-compliant XML-handling into Deno. At least until a Rust-implementation is ready.

Secondly, I think including as many browser-APIs as possible into Deno core would massively alleviate development. (for example, a browser module working with XML-data cannot be used in Deno right now). Beeing able to use the same modules in front- and backend development would IMHO be a strong incentive to switch from node to deno.

@kitsonk
Copy link
Contributor

kitsonk commented May 19, 2020

Calls can be, but fetching and instantiating WebAssembly is inherently async.

WebAssembly isn't a magic solution to compiling code. WebAssembly lives and runs in a sandbox, a sandbox that has very limited interaction with the outside world, just like JavaScript. Most code that isn't written with targeting WebAssembly in mind simply won't work with a lot of work re-writing that code. The module you reference is far from a standalone module. It is effectively an expose of an API that is built on top of the whole of the DOM implementation for Firefox and the whole of the DOM implementation of Firefox expects a lot more coupling to the host than is what is available in WebAssembly.

What works well in WebAssembly is code that is designed for WebAssembly, that has discreet functionality, that expects to run in a sandbox.

Including browser APIs comes at a cost of maintaining that API. There are a lot of APIs out there too that are part of the browser spec that are pretty ugly, and so it makes a lot of sense to take a far more metered approach. Not just throw everything.

@MarkTiedemann
Copy link
Contributor Author

I (maybe naively) assumed that compiling something like https://searchfox.org/mozilla-central/source/dom/base/DOMParser.cpp might fast-track inclusion of standard-compliant XML-handling into Deno. At least until a Rust-implementation is ready.

As mentioned earlier in this thread, there is a standard compliant solution in Rust by Mozilla: https://github.com/servo/html5ever

There are a lot of APIs out there too that are part of the browser spec that are pretty ugly, and so it makes a lot of sense to take a far more metered approach.

I'm not sure "ugly" is a good argument. I'd rather have Deno support ugly, old, known standards, such as, DOMParser, which, by the way, is literally available in Chrome since Chrome version 1, rather than the beautiful, latest, experimental, unstable APIs...

@kitsonk
Copy link
Contributor

kitsonk commented May 19, 2020

@MarkTiedemann I am not saying no. I was responding to "it is a web standard so Deno should have it" and that the intent is not to have every web standard as part of Deno.

At the moment, it isn't a priority for anyone. There servo implementation would go quite a long way, but it is a decent amount of work to make it work in JavaScript compliant with the APIs and ensure it is well tested. If someone were to work on it I am sure the contribution would be welcome.

@MarkTiedemann
Copy link
Contributor Author

You are right, @kitsonk, this is indeed not a priority and a lot of work. After further consideration, I also think that this idea should probably be tested outside of Deno core in a userland Rust plugin first.

Closing for now. :)

PS: If anyone is interested on working on this, please tag me. I have little Rust experience so far, but I'd like to join.

@timreichen
Copy link
Contributor

Maybe of some interest? https://deno.land/x/deno_dom

@yacinehmito
Copy link
Contributor

@MarkTiedemann By closing, did you mean that you will work on it or that it isn't worth working on it?
If neither, I think it is clearer to reopen it.

@iugo
Copy link
Contributor

iugo commented Nov 21, 2023

deno doc about DOM: https://docs.deno.com/runtime/manual/advanced/jsx_dom/overview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat new feature (which has been agreed to/accepted)
Projects
None yet
Development

No branches or pull requests

10 participants