-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html_to_vdom breaks on whitespaces and null HTML tags #777
Comments
@rmorshea Would like you to chime in on this one too, while you still have availability. |
I'll take a look at this over the weekend. |
The issue I categorized as a "script tag" issue appears to be related to HTML doesn't have end tags (such as |
Googling and testing shows me there doesn't appear to be a fool proof way of resolving this with python's built-in In the interest of maintainability, I'm going to switch us to using |
That target parser interface of |
Using that interface would likely make us fall into the same pitfalls, since it doesn't provide access to the DOM tree. I've been testing out performing these transformations recursively. Seems to work off my rudimentary test, and is a lot more readable. from io import StringIO
from pprint import pprint
from typing import Union
from lxml import etree
my_html = """
<div id="view_to_component_middleware">
view_to_component_middleware: Success
<div class="inner"> this is text </div>
</div>\n
<hr>
<script>
var dog = 'memes';
</script>"""
def _set_if_val_exists(object, key, value):
"""Sets a key on a dictionary if the value's length is greater than 0."""
if len(value):
object[key] = value
def html_to_vdom(html: Union[str, etree._Element], *transforms):
"""Convert an lxml.etree node tree into a VDOM dict."""
# If the user provided a string, convert it to an lxml.etree node.
if isinstance(html, str):
parser = etree.HTMLParser()
tree = etree.parse(StringIO(html), parser)
node = tree.getroot()
elif isinstance(html, etree._Element):
node = html
else:
raise TypeError("html_to_vdom expects a string or lxml.etree._Element")
# Convert the lxml.etree node to a VDOM dict.
vdom = {"tagName": node.tag}
_set_if_val_exists(vdom, "attributes", dict(node.items()))
_set_if_val_exists(
vdom, "children", [html_to_vdom(child) for child in node.iterchildren(None)]
)
# Apply any transforms.
for transform in transforms:
vdom = transform(vdom)
return vdom
pprint(html_to_vdom(my_html)) |
Current Situation
If a HTML string has ...
<hr>
within the document... then
html_to_vdom
will silently fail without raising an exceptionSee my Django IDOM PR for how I caused the
script
tag issue to occur at this LOC.Proposed Actions
Whitespace issue can be fixed by simple regex and/or using
str.strip()
.Script tag issue needs further investigation. This might open up a can of worms to test
html_to_vdom
with allidom.html
types.The text was updated successfully, but these errors were encountered: