You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m the maintainer of cssselect, which does in Python pretty much the same as Nokogiri for CSS selectors: translate them to XPath. It looks like the scrapy/cssselect#12 bug also applies to Nokogiri. Namely, the XPath translation of :nth-child() and similar pseudo-classes is wrong when used after the + or ~ combinator. Here is a test case:
Expected output: child2 child2. Actual output child2 child3.
The problem is in the XPath translation of the later selector: //child1/following-sibling::*[position() = 2 and self::*] gives the element at position 2 when counting from child1, while we want the position among the parent’s children.
I am not sure it is even possible to correctly translate this selector to XPath: the = XPath operator on node-sets compares the text content of elements, not their identity.
Now for a selector h2 ~ div:nth-of-type(2) the XPath expression could be //h2/following-sibling::div[count(preceding-sibling::div)=1].
But for the more general case: h2 ~ *:nth-of-type(2), XPath is //h2/following-sibling::*[count(preceding-sibling::*[name(.)=name(…)])=1] is there some expression we could put instead of … to refer to the outer scope? Or maybe a way to bind the outer scope to a variable?
Hi,
I’m the maintainer of cssselect, which does in Python pretty much the same as Nokogiri for CSS selectors: translate them to XPath. It looks like the scrapy/cssselect#12 bug also applies to Nokogiri. Namely, the XPath translation of
:nth-child()
and similar pseudo-classes is wrong when used after the+
or~
combinator. Here is a test case:Expected output:
child2 child2
. Actual outputchild2 child3
.The problem is in the XPath translation of the later selector:
//child1/following-sibling::*[position() = 2 and self::*]
gives the element at position 2 when counting from child1, while we want the position among the parent’s children.I am not sure it is even possible to correctly translate this selector to XPath: the
=
XPath operator on node-sets compares the text content of elements, not their identity.The issue is similar for scrapy/cssselect#4 and Nokogiri’s #394.
The text was updated successfully, but these errors were encountered: