Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for :has() selector #169

Closed
124C41p opened this issue Feb 8, 2024 · 8 comments · Fixed by #187
Closed

Support for :has() selector #169

124C41p opened this issue Feb 8, 2024 · 8 comments · Fixed by #187
Assignees
Labels
C-feature-request Category: feature request

Comments

@124C41p
Copy link

124C41p commented Feb 8, 2024

Hi, do you plan to support the :has() selector? To my understanding, this css keyword is needed for selecting objects based on the parent of another known object.

Consider the following example:

<div>
    <div id="foo">
        Hi There!
    </div>
</div>
<ul>
    <li>first</li>
    <li>second</li>
    <li>third</li>
</ul>

In order to select the second list item, I would like to use the following selector:

let selector = Selector::parse("div:has(div#foo) + ul > li:nth-child(2)").unwrap();

This line however panics as of scraper version 0.18.1.

@adamreichold
Copy link
Member

adamreichold commented Feb 8, 2024

I think this is still missing support in our upstream selectors dependency, at least in the version published on crates.io.

@cyqsimon
Copy link

+1. I'm trying to scrape Wikipedia, which has this sort of nesting. For example:

<h2>
  <span class="mw-headline" id="Registered_ports">Registered ports</span>
  <!-- ... -->
</h2>

This selector: h2:has(#Registered_ports) ~ .wikitable.sortable would pick the first table after this h2, which is a good way to locate the content in lieu of a distinctive id/class on the table itself.

@cfvescovo cfvescovo added the C-feature-request Category: feature request label Feb 22, 2024
@nicoburns
Copy link

nicoburns commented Feb 29, 2024

From what I can see selectors 0.25 (published to crates.io) does have :has support. See https://docs.rs/selectors/latest/selectors/parser/enum.Component.html#variant.Has Although there seem to be performance improvements in more recent unreleased commits.

@nathaniel-daniel
Copy link

servo/servo#25133

@jameshurst
Copy link

I had taken a look into adding :is() support and it seems like both :is() and :has() are already supported by selectors. The Parser impl needs to enable support by implementing parse_is_and_where and parse_has.

fn parse_is_and_where(&self) -> bool {
    true
}

fn parse_has(&self) -> bool {
    true
}

@causal-agent Should it be safe to enable support for these selectors? I can make a PR with these changes unless these selectors are not enabled for a reason.

@adamreichold
Copy link
Member

The Parser impl needs to enable support by implementing parse_is_and_where and parse_has.

Thank you for looking into this!

Should it be safe to enable support for these selectors? I can make a PR with these changes unless these selectors are not enabled for a reason.

I think only tests will answer that. Please open a PR, ideally including a test case. I can try to then also give it a spin in a code base containing a pretty diverse set of scrapers and see if anything breaks that is not caught by the tests here.

@cfvescovo
Copy link
Member

@jameshurst when your PR is ready, tag me. I will run some tests and review it ASAP.

@cfvescovo cfvescovo self-assigned this Jul 16, 2024
@cfvescovo
Copy link
Member

I opened a PR addressing this, have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants