CSS `foo~:nth-child(2)` gives incorrect XPath #707

SimonSapin · 2012-06-18T09:48:19Z

Hi,

I’m the maintainer of cssselect, which does in Python pretty much the same as Nokogiri for CSS selectors: translate them to XPath. It looks like the scrapy/cssselect#12 bug also applies to Nokogiri. Namely, the XPath translation of :nth-child() and similar pseudo-classes is wrong when used after the + or ~ combinator. Here is a test case:

require 'nokogiri'
doc = Nokogiri::XML('<root><child1/><child2/><child3/></root>')
puts doc.css(':nth-child(2)').map { |e| e.name }
puts doc.css('child1 ~ :nth-child(2)').map { |e| e.name }

Expected output: child2 child2. Actual output child2 child3.

The problem is in the XPath translation of the later selector: //child1/following-sibling::*[position() = 2 and self::*] gives the element at position 2 when counting from child1, while we want the position among the parent’s children.

I am not sure it is even possible to correctly translate this selector to XPath: the = XPath operator on node-sets compares the text content of elements, not their identity.

The issue is similar for scrapy/cssselect#4 and Nokogiri’s #394.

The text was updated successfully, but these errors were encountered:

AurelPaulovic · 2012-12-07T18:37:44Z

you should not use position() which depends on the context position

instead try

//child1/following-sibling::*[(count(preceding-sibling::*) +1)=2]

and similarly for Xn

//child1/following-sibling::*[(count(preceding-sibling::*)+1) mod X = 0]

SimonSapin · 2012-12-07T19:00:28Z

Thank you @AurelPaulovic , I think that should work.

Now for a selector h2 ~ div:nth-of-type(2) the XPath expression could be //h2/following-sibling::div[count(preceding-sibling::div)=1].
But for the more general case: h2 ~ *:nth-of-type(2), XPath is //h2/following-sibling::*[count(preceding-sibling::*[name(.)=name(…)])=1] is there some expression we could put instead of … to refer to the outer scope? Or maybe a way to bind the outer scope to a variable?

AurelPaulovic · 2012-12-07T23:49:33Z

Sadly, there is no way how to do that in XPath 1.0. You can't assign any variables and there is no way how to get to the outer context.

redapple mentioned this issue Jul 12, 2016

Fix :nth-*(an+b) pseudo-classes selectors scrapy/cssselect#60

Merged

flavorjones added topic/css needs/fix-for-failing-test labels Jan 5, 2019

flavorjones added the help wanted label Dec 24, 2021

flavorjones mentioned this issue Jun 19, 2024

explore: alternative CSS selector parsers #2560

Open

flavorjones added this to the v2.0.0 milestone Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSS `foo~:nth-child(2)` gives incorrect XPath #707

CSS `foo~:nth-child(2)` gives incorrect XPath #707

SimonSapin commented Jun 18, 2012

AurelPaulovic commented Dec 7, 2012

SimonSapin commented Dec 7, 2012

AurelPaulovic commented Dec 7, 2012

CSS foo~:nth-child(2) gives incorrect XPath #707

CSS foo~:nth-child(2) gives incorrect XPath #707

Comments

SimonSapin commented Jun 18, 2012

AurelPaulovic commented Dec 7, 2012

SimonSapin commented Dec 7, 2012

AurelPaulovic commented Dec 7, 2012

CSS `foo~:nth-child(2)` gives incorrect XPath #707

CSS `foo~:nth-child(2)` gives incorrect XPath #707