Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LXML for html_to_vdom #795

Merged
merged 54 commits into from
Aug 14, 2022
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
43dec3b
LXML based html_to_vdom
Archmonger Aug 4, 2022
f0a3220
better interface for html_to_vdom
Archmonger Aug 4, 2022
c6ad8bf
cleanup typehints and exceptions
Archmonger Aug 4, 2022
a2d995a
variable and function name cleanup
Archmonger Aug 4, 2022
4dfde93
fix tests
Archmonger Aug 4, 2022
e5fdfc3
fix more tests
Archmonger Aug 4, 2022
f3925cb
fix transform logic
Archmonger Aug 4, 2022
8156e28
fix test warnings
Archmonger Aug 4, 2022
4600af4
rename to _prune_vdom_fields
Archmonger Aug 4, 2022
88c14c5
make safe assumption in _vdom_mutations
Archmonger Aug 4, 2022
32c5db7
docstrings
Archmonger Aug 4, 2022
55d5d13
switch back to function based approach
Archmonger Aug 4, 2022
13c7f8a
better function names
Archmonger Aug 4, 2022
e56523c
perform _generate_vdom_children in a single pass
Archmonger Aug 5, 2022
0924c5f
add changelog entry
Archmonger Aug 5, 2022
f9f169a
add null tag test
Archmonger Aug 5, 2022
4c40afc
more robust style string parser
Archmonger Aug 5, 2022
e5ca858
fix tests
Archmonger Aug 5, 2022
37b2019
remove uneeded strip
Archmonger Aug 5, 2022
64c6515
root_node position cleanup
Archmonger Aug 5, 2022
c7bc8ed
user API only accepts str
Archmonger Aug 5, 2022
4d0c03c
etree_to_vdom
Archmonger Aug 6, 2022
3f7b78e
Try to use existing root node
Archmonger Aug 7, 2022
d448102
test_html_to_vdom_with_no_parent_node
Archmonger Aug 7, 2022
2813de1
more compact formatting for tests
Archmonger Aug 7, 2022
bedab56
remove non-html from VDOM tree
Archmonger Aug 7, 2022
dcb3789
Allow for customizing whether the parser is strict
Archmonger Aug 7, 2022
e464277
make etree to vdom private
Archmonger Aug 8, 2022
0ddf937
Remove recover parameter
Archmonger Aug 8, 2022
f70cfc5
Update src/idom/utils.py
Archmonger Aug 9, 2022
e46e1fc
Update src/idom/utils.py
Archmonger Aug 9, 2022
2396169
Update src/idom/utils.py
Archmonger Aug 9, 2022
873a048
hasattr(idom.html, tagName)
Archmonger Aug 9, 2022
3bb5df0
fix tests
Archmonger Aug 9, 2022
9c69569
avoid unneeded list unpacking
Archmonger Aug 9, 2022
63acce6
more comments
Archmonger Aug 13, 2022
86fd44d
better _hypen_to_camel_case
Archmonger Aug 13, 2022
43998bd
Revert "better _hypen_to_camel_case"
Archmonger Aug 13, 2022
501e14c
use `TEMP` instead of `div` for root node
Archmonger Aug 13, 2022
40e073e
ignore coverage on type checks
Archmonger Aug 13, 2022
bde14c1
Merge branch 'fix-html-to-vdom' of https://github.com/Archmonger/idom…
Archmonger Aug 13, 2022
dd23e88
Update src/idom/utils.py
Archmonger Aug 13, 2022
5aefd37
remove prune vdom fields
Archmonger Aug 13, 2022
bf37464
type hints
Archmonger Aug 13, 2022
40116c7
_hypen_to_camel_case using string partition
Archmonger Aug 13, 2022
3999aad
fix _ModelTransform def
Archmonger Aug 13, 2022
c52791a
TypeError string update
Archmonger Aug 13, 2022
4ca1f11
Convince type checker that it's safe to mutate attributes
Archmonger Aug 14, 2022
cf532f9
test non html element behavior
rmorshea Aug 14, 2022
9262c15
remove recover=False
Archmonger Aug 14, 2022
1a23a01
Add strict parameter
Archmonger Aug 14, 2022
07b3470
add type hint
rmorshea Aug 14, 2022
fef1844
Update src/idom/utils.py
Archmonger Aug 14, 2022
66846ec
clearer verbiage
Archmonger Aug 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/about/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Unreleased

**Fixed**

- :issue:`777` - Fix edge cases where ``html_to_vdom`` can fail to convert HTML
- :issue:`789` - Conditionally rendered components cannot use contexts
- :issue:`773` - Use strict equality check for text, numeric, and binary types in hooks
- :issue:`801` - Accidental mutation of old model causes invalid JSON Patch
Expand All @@ -38,6 +39,7 @@ Unreleased
**Added**

- :pull:`123` - ``asgiref`` as a dependency
- :pull:`795` - ``lxml`` as a dependency


v0.39.0
Expand Down
1 change: 1 addition & 0 deletions requirements/pkg-deps.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ fastjsonschema >=2.14.5
requests >=2
colorlog >=6
asgiref >=3
lxml >= 4
2 changes: 1 addition & 1 deletion src/idom/backend/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def run(
implementation: BackendImplementation[Any] | None = None,
) -> None:
"""Run a component with a development server"""
logger.warn(
logger.warning(
"You are running a development server. "
"Change this before deploying in production!"
)
Expand Down
229 changes: 147 additions & 82 deletions src/idom/utils.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
from html.parser import HTMLParser as _HTMLParser
from typing import Any, Callable, Dict, Generic, List, Optional, Tuple, TypeVar
from itertools import chain
from typing import Any, Callable, Dict, Generic, Iterable, List, TypeVar, Union

from lxml import etree
from lxml.html import fragments_fromstring

import idom


_RefValue = TypeVar("_RefValue")
_ModelTransform = Callable[[Dict[str, Any]], Any]
_UNDEFINED: Any = object()


Expand Down Expand Up @@ -49,11 +55,9 @@ def __repr__(self) -> str:
return f"{type(self).__name__}({current})"


_ModelTransform = Callable[[Dict[str, Any]], Any]


def html_to_vdom(source: str, *transforms: _ModelTransform) -> Dict[str, Any]:
"""Transform HTML into a DOM model
def html_to_vdom(html: str, *transforms: _ModelTransform) -> Dict:
"""Transform HTML into a DOM model. Unique keys can be provided to HTML elements
using a ``key=...`` attribute within your HTML tag.

Parameters:
source:
Expand All @@ -63,80 +67,141 @@ def html_to_vdom(source: str, *transforms: _ModelTransform) -> Dict[str, Any]:
dictionary which will be replaced by ``new``. For example, you could use a
transform function to add highlighting to a ``<code/>`` block.
"""
parser = HtmlParser()
parser.feed(source)
root = parser.model()
to_visit = [root]
while to_visit:
node = to_visit.pop(0)
if isinstance(node, dict) and "children" in node:
transformed = []
for child in node["children"]:
if isinstance(child, dict):
for t in transforms:
child = t(child)
if child is not None:
transformed.append(child)
to_visit.append(child)
node["children"] = transformed
if "attributes" in node and not node["attributes"]:
del node["attributes"]
if "children" in node and not node["children"]:
del node["children"]
return root


class HtmlParser(_HTMLParser):
"""HTML to VDOM parser

Example:

.. code-block::

parser = HtmlParser()

parser.feed(an_html_string)
parser.feed(another_html_string)
...

vdom = parser.model()
if not isinstance(html, str):
raise TypeError(f"Encountered unsupported type {type(html)} from {html}")
Archmonger marked this conversation as resolved.
Show resolved Hide resolved

# If the user provided a string, convert it to a list of lxml.etree nodes
parser = etree.HTMLParser(
remove_comments=True,
remove_pis=True,
remove_blank_text=True,
recover=False,
)
nodes: List = fragments_fromstring(html, no_leading_text=True, parser=parser)
has_root_node = len(nodes) == 1

# Find or create a root node
if has_root_node:
root_node = nodes[0]
else:
root_node = etree.Element("div", None, None)
Archmonger marked this conversation as resolved.
Show resolved Hide resolved
for child in nodes:
root_node.append(child)

# Convert the lxml node to a VDOM dict
vdom = _etree_to_vdom(root_node, transforms)

# Change the artificially created root node to a React Fragment, instead of a div
if not has_root_node:
vdom["tagName"] = ""

return vdom


def _etree_to_vdom(node: etree._Element, transforms: Iterable[_ModelTransform]) -> Dict:
Archmonger marked this conversation as resolved.
Show resolved Hide resolved
"""Recusively transform an lxml etree node into a DOM model

Parameters:
source:
The ``lxml.etree._Element`` node
transforms:
Functions of the form ``transform(old) -> new`` where ``old`` is a VDOM
dictionary which will be replaced by ``new``. For example, you could use a
transform function to add highlighting to a ``<code/>`` block.
"""
if not isinstance(node, etree._Element):
raise TypeError(f"Encountered unsupported type {type(node)} from {node}")
Archmonger marked this conversation as resolved.
Show resolved Hide resolved

# This will recursively call _etree_to_vdom() on all children
children = _generate_vdom_children(node, transforms)

# Convert the lxml node to a VDOM dict
attributes = dict(node.items())
key = attributes.pop("key", None)
vdom = (
# If available, use a constructor from idom.html to create the VDOM dict
getattr(idom.html, node.tag)(attributes, *children, key=key)
if hasattr(idom.html, node.tag)
# Fall back to using a generic VDOM dict
else {
"tagName": node.tag,
"children": children,
"attributes": attributes,
"key": key,
}
)
Archmonger marked this conversation as resolved.
Show resolved Hide resolved

# Perform any necessary mutations on the VDOM attributes to meet VDOM spec
_mutate_vdom(vdom)

# Apply any provided transforms.
for transform in transforms:
vdom = transform(vdom)

def model(self) -> Dict[str, Any]:
"""Get the current state of parsed VDOM model"""
return self._node_stack[0]

def feed(self, data: str) -> None:
"""Feed in HTML that will update the :meth:`HtmlParser.model`"""
self._node_stack.append(self._make_vdom("div", {}))
super().feed(data)

def reset(self) -> None:
"""Reset the state of the parser"""
self._node_stack: List[Dict[str, Any]] = []
super().reset()

def handle_starttag(self, tag: str, attrs: List[Tuple[str, Optional[str]]]) -> None:
new = self._make_vdom(tag, dict(attrs))
current = self._node_stack[-1]
current["children"].append(new)
self._node_stack.append(new)

def handle_endtag(self, tag: str) -> None:
del self._node_stack[-1]

def handle_data(self, data: str) -> None:
self._node_stack[-1]["children"].append(data)

@staticmethod
def _make_vdom(tag: str, attrs: Dict[str, Any]) -> Dict[str, Any]:
if "style" in attrs:
style = attrs["style"]
if isinstance(style, str):
style_dict = {}
for k, v in (part.split(":", 1) for part in style.split(";") if part):
title_case_key = k.title().replace("-", "")
camel_case_key = title_case_key[:1].lower() + title_case_key[1:]
style_dict[camel_case_key] = v
attrs["style"] = style_dict
return {"tagName": tag, "attributes": attrs, "children": []}
# Get rid of empty VDOM fields
_prune_vdom_fields(vdom)
Archmonger marked this conversation as resolved.
Show resolved Hide resolved

return vdom


def _mutate_vdom(vdom: Dict):
"""Performs any necessary mutations on the VDOM attributes to meet VDOM spec.

Currently, this function only transforms the ``style`` attribute into a dictionary whose keys are
camelCase so as to be renderable by React.

This function may be extended in the future.
"""
# Determine if the style attribute needs to be converted to a dict
if (
"attributes" in vdom
and "style" in vdom["attributes"]
and isinstance(vdom["attributes"]["style"], str)
):
# Convert style attribute from str -> dict with camelCase keys
vdom["attributes"]["style"] = {
_hypen_to_camel_case(key.strip()): value.strip()
for key, value in (
part.split(":", 1)
for part in vdom["attributes"]["style"].split(";")
if ":" in part
)
}
Archmonger marked this conversation as resolved.
Show resolved Hide resolved


def _prune_vdom_fields(vdom: Dict):
"""Remove unneeded fields from VDOM dict."""
if "children" in vdom and not len(vdom["children"]):
del vdom["children"]
if "attributes" in vdom and not len(vdom["attributes"]):
del vdom["attributes"]
if "key" in vdom and not vdom["key"]:
del vdom["key"]


def _generate_vdom_children(
node: etree._Element, transforms: Iterable[_ModelTransform]
) -> List[Union[Dict, str]]:
"""Generates a list of VDOM children from an lxml node.

Inserts inner text and/or tail text inbetween VDOM children, if necessary.
"""
return ( # Get the inner text of the current node
[node.text] if node.text else []
) + list(
chain(
*(
# Recursively convert each child node to VDOM
[_etree_to_vdom(child, transforms)]
# Insert the tail text between each child node
+ ([child.tail] if child.tail else [])
for child in node.iterchildren(None)
)
)
)


def _hypen_to_camel_case(string: str) -> str:
"""Convert a hypenated string to camelCase."""
first, remainder = string.split("-", 1)
return first.lower() + remainder.title().replace("-", "")
2 changes: 1 addition & 1 deletion src/idom/widgets.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def use_linked_inputs(
value, set_value = idom.hooks.use_state(initial_value)

def sync_inputs(event: Dict[str, Any]) -> None:
new_value = event["value"]
new_value = event["target"]["value"]
set_value(new_value)
if not new_value and ignore_empty:
return None
Expand Down
54 changes: 45 additions & 9 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,18 +60,15 @@ def test_ref_repr():
],
)
def test_html_to_vdom(case):
assert html_to_vdom(case["source"]) == {
"tagName": "div",
"children": [case["model"]],
}
assert html_to_vdom(case["source"]) == case["model"]


def test_html_to_vdom_transform():
source = "<p>hello <a>world</a> and <a>universe</a></p>"
source = "<p>hello <a>world</a> and <a>universe</a>lmao</p>"

def make_links_blue(node):
if node["tagName"] == "a":
node["attributes"]["style"] = {"color": "blue"}
node["attributes"] = {"style": {"color": "blue"}}
return node

expected = {
Expand All @@ -89,10 +86,49 @@ def make_links_blue(node):
"children": ["universe"],
"attributes": {"style": {"color": "blue"}},
},
"lmao",
],
}

assert html_to_vdom(source, make_links_blue) == expected

rmorshea marked this conversation as resolved.
Show resolved Hide resolved

def test_html_to_vdom_with_null_tag():
source = "<p>hello<br>world</p>"

expected = {
"tagName": "p",
"children": [
"hello",
{"tagName": "br"},
"world",
],
}

assert html_to_vdom(source, make_links_blue) == {
"tagName": "div",
"children": [expected],
assert html_to_vdom(source) == expected


def test_html_to_vdom_with_style_attr():
source = '<p style="color: red; background-color : green; ">Hello World.</p>'

expected = {
"attributes": {"style": {"backgroundColor": "green", "color": "red"}},
"children": ["Hello World."],
"tagName": "p",
}

assert html_to_vdom(source) == expected


def test_html_to_vdom_with_no_parent_node():
source = "<p>Hello</p><div>World</div>"

expected = {
"tagName": "",
"children": [
{"tagName": "p", "children": ["Hello"]},
{"tagName": "div", "children": ["World"]},
],
}

assert html_to_vdom(source) == expected