You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am looking at ways to build an HTML5 sanitizer capable of running in both Browser, NodeJS and Java environments, Java being the lowest priority at the moment. The most important requirement is to not rely on a DOM to be able to operate in these environments. I stumbled upon html5ever and it looks like the perfect tool to use for my scenario with the added benefit that it's part of the Servo project.
For Browser and NodeJS environments I would have to produce WASM artifacts given the simplicity of dealing with multiple platforms in NodeJS but also because of environments where I may not be able to load NodeJS binary native plugins. For Browser environments or mobile WebView there is no other option than producing a WASM artifact so these are the restrictions around the distribution process which I am fine with.
I am using Rust to build the sanitizer so this keeps things easy to manage staying in the same programming language all the way in the development process.
Currently, when compiling html5ever to WASM I get an output of 450kb even when running it through wasm-opt and being very aggressive on the optimizations for size. Unfortunately that is way too big of a file for the Web. Ideally, if it can be around 50kb it would make html5ever a much more desirable alternative to existing Javascript sanitizers for the browser.
I would like to ask if there is a way to either compile html5ever to WASM so that I can reach my desired target size or, alternatively, use only features from the parser that I currently need in hopes that by doing this I will manage to shave off a considerable amount of code.
My main scenario is the following: given a string containing HTML, produce a DOM tree which can be traversed to identify tags, attributes and attribute values which should be eliminated. Return a string.
Thank you for taking the time to read this issue, hopefully with your help I'll be able to use html5ever to achieve my goals.
The text was updated successfully, but these errors were encountered:
I have no experience with attempting to minimize wasm builds, so I can't provide any assistance there. Html5ever is designed to follow a specific parsing algorithm that is web-compatible, and I'm unaware of any optional features that can be disabled as a result.
From my few experience to minimize wasm build size, at least Rust v1.69, it can reduce the size aggressively to enable lto option than to do post processing by wasm-opt. There are FullLTO or ThinLTO, either is fine.
@tetsuharuohzeki I was using nightly for this one with lto optimization. It seems the 450kb is the best I could do after a bit of fiddling around with optimization settings.
Hello,
I am looking at ways to build an HTML5 sanitizer capable of running in both Browser, NodeJS and Java environments, Java being the lowest priority at the moment. The most important requirement is to not rely on a DOM to be able to operate in these environments. I stumbled upon html5ever and it looks like the perfect tool to use for my scenario with the added benefit that it's part of the Servo project.
For Browser and NodeJS environments I would have to produce WASM artifacts given the simplicity of dealing with multiple platforms in NodeJS but also because of environments where I may not be able to load NodeJS binary native plugins. For Browser environments or mobile WebView there is no other option than producing a WASM artifact so these are the restrictions around the distribution process which I am fine with.
I am using Rust to build the sanitizer so this keeps things easy to manage staying in the same programming language all the way in the development process.
Currently, when compiling html5ever to WASM I get an output of 450kb even when running it through wasm-opt and being very aggressive on the optimizations for size. Unfortunately that is way too big of a file for the Web. Ideally, if it can be around 50kb it would make html5ever a much more desirable alternative to existing Javascript sanitizers for the browser.
I would like to ask if there is a way to either compile html5ever to WASM so that I can reach my desired target size or, alternatively, use only features from the parser that I currently need in hopes that by doing this I will manage to shave off a considerable amount of code.
My main scenario is the following: given a string containing HTML, produce a DOM tree which can be traversed to identify tags, attributes and attribute values which should be eliminated. Return a string.
Thank you for taking the time to read this issue, hopefully with your help I'll be able to use html5ever to achieve my goals.
The text was updated successfully, but these errors were encountered: