-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING: Complete xpath module rewrite #24
Conversation
* Also removes XPathResult and makes all expressions return XpathItemSet
TODO:
|
Migration Guide DraftItem TypeThe biggest change is the return type. Before it was a list of items that could be either an HtmlTag or HtmlText. Now the items are a much more complicated type following the XPath specification. Below is an overview of the returned item type /// https://www.w3.org/TR/xpath-datamodel-31/#dt-item
#[derive(PartialEq, PartialOrd, Eq, Ord, Debug, Clone, Hash, EnumExtract)]
pub enum XpathItem<'tree> {
/// A node in the [`XpathItemTree`](crate::xpath::XpathItemTree).
///
/// https://www.w3.org/TR/xpath-datamodel-31/#dt-node
Node(Node<'tree>),
/// A function item.
///
/// https://www.w3.org/TR/xpath-datamodel-31/#dt-function-item
Function(Function),
/// An atomic value.
///
/// https://www.w3.org/TR/xpath-datamodel-31/#dt-atomic-value
AnyAtomicType(AnyAtomicType),
}
/// A node in the [`XpathItemTree`](crate::xpath::XpathItemTree).
///
/// https://www.w3.org/TR/xpath-datamodel-31/#dt-node
#[derive(PartialEq, PartialOrd, Eq, Ord, Debug, Clone, Hash, EnumExtract)]
pub enum Node<'tree> {
/// A node in the [`XpathItemTree`](crate::xpath::XpathItemTree).
TreeNode(XpathItemTreeNode<'tree>),
/// A node that is not part of an [`XpathItemTree`](crate::xpath::XpathItemTree).
NonTreeNode(NonTreeXpathNode),
}
/// Nodes that are not part of the [`XpathItemTree`].
#[derive(PartialEq, PartialOrd, Eq, Ord, Debug, Clone, Hash, EnumExtract)]
pub enum NonTreeXpathNode {
/// An attribute node.
AttributeNode(AttributeNode),
/// A namespace node.
NamespaceNode(NamespaceNode),
}
/// A node in the [`XpathItemTree`].
#[derive(PartialEq, PartialOrd, Eq, Ord, Debug, Clone, Hash)]
pub struct XpathItemTreeNode<'a> {
id: NodeId,
/// The data associated with this node.
pub data: &'a XpathItemTreeNodeData,
}
/// Nodes that are part of the [`XpathItemTree`].
#[derive(PartialEq, PartialOrd, Eq, Ord, Debug, Hash, EnumExtract)]
pub enum XpathItemTreeNodeData {
/// The root node of the document.
DocumentNode(XpathDocumentNode),
/// An element node.
///
/// HTML tags are represented as element nodes.
ElementNode(ElementNode),
/// A processing instruction node.
PINode(PINode),
/// A comment node.
CommentNode(CommentNode),
/// A text node.
TextNode(TextNode),
} Xpath Item TreeTo facilitate the new XpathItem type, xpath expressions now must be passed an
let expr = xpath::parse("//td[@class='something']//span").unwrap();
- let results = expr.apply(&html_document)?;
+ let xpath_item_tree = XpathItemTree::from(&html_document);
+ let results = expr.apply(&xpath_item_tree)?; Getting TextText nodes are a type of Other changes:
- let text = item.get_text(&html_document).unwrap();
+ let text = item.as_node()?.as_tree_node()?.text(&page); Getting AttributesAttribute nodes are a type of - let attribute = item.get_attributes().unwrap().get("href").unwrap();
+ let element = item.as_node()?.as_tree_node()?.data.as_element_node()?;
+ let attribute = element.get_attribute("href").unwrap(); or alternatively, use xpath to select the attribute node - let expr = xpath::parse("//td[@class='something']//span").unwrap();
- let items = expr.apply(&html_document)?;
- let attribute = items[0].get_attributes().unwrap().get("href").unwrap();
+ let expr = xpath::parse("//td[@class='something']//span/@href").unwrap();
+ let items = expr.apply(&xpath_item_tree)?;
+ let attribute = items[0].as_node()?.as_non_tree_node()?.as_attribute_node()?.value; |
BREAKING: Complete xpath module rewrite
BREAKING: Complete xpath module rewrite
BREAKING: Complete xpath module rewrite
The goal of this rewrite is to bring the implementation of the xpath module in line with the official xpath specification as defined in https://www.w3.org/TR/2017/REC-xpath-31-20170321/.
The main advantage of doing this is that it makes supporting more features is easier when you can follow the spec (obviously!).
One of the main limitations of the old xpath module was that it could only return "Text" or "Tag" nodes, which means there's no way to select other things that xpath supports like attributes. This rewrite makes that possible, at the cost of some added complexity on the return types.
Fixes #17
Fixes #15
It also fixes indexing which was previously being applied to the total set of items after every step, rather than per parent node, as mentioned in #21.