Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is unable to scrape <li> #100

Open
amztc34283 opened this issue Oct 8, 2024 · 7 comments
Open

It is unable to scrape <li> #100

amztc34283 opened this issue Oct 8, 2024 · 7 comments

Comments

@amztc34283
Copy link

amztc34283 commented Oct 8, 2024

Screenshot 2024-10-07 at 8 40 01 PM
wanted_list = ["Design, develop, test, refactor and scale backend implementations of new and existing consumer product features"]

scraper = AutoScraper()
result = scraper.build(url, wanted_list)

I am able to scrape the element in the wanted_list but similar elements are not scraped successfully, any tips and tricks could fix this?

@alirezamika
Copy link
Owner

please provide your full code including the url.

@amztc34283
Copy link
Author

wanted_list = ["Design, develop, test, refactor and scale backend implementations of new and existing consumer product features"]
scraper = AutoScraper()
result = scraper.build(url, wanted_list)
print(result)

Link: https://careers.chime.com/en/jobs/4225356002/backend-engineer/

@alirezamika
Copy link
Owner

What is your expected output?

@amztc34283
Copy link
Author

amztc34283 commented Oct 11, 2024

My expected output is the content of all the <li> under the same <ul> which is:
Design, develop ...
Work with ...
Collaborate with ...
Proactively find ...

@alirezamika
Copy link
Owner

you can try the contain_sibling_leaves attribute.

result = scraper.get_result_similar(url, contain_sibling_leaves=True)

@amztc34283
Copy link
Author

I will give it a try, thanks.

In addition, can you point me to the part of the code that decides which elements to scrape based on the wanted_list? Thank you!

@alirezamika
Copy link
Owner

its basically the whole code 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants