-
-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added prune xpath to spider #684
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #684 +/- ##
=======================================
Coverage 98.54% 98.55%
=======================================
Files 21 21
Lines 3517 3529 +12
=======================================
+ Hits 3466 3478 +12
Misses 51 51 ☔ View full report in Codecov by Sentry. |
Hi @felipehertzer, nice to see that spider is adaptable and suits your needs. I would like to see That way you could just pass it along with the other parameters and make the following changes:
Does that work for you? |
Hey @adbar, Thanks for the feedback—I understand how the parameters are set up now and have made the requested changes. I noticed that the code below directly calls htmlstring, homepage, new_base_url = probe_alternative_homepage(url)
if htmlstring and homepage and new_base_url:
# register potentially new homepage
URL_STORE.add_urls([homepage])
# extract links on homepage
process_links(htmlstring, params, url=url) |
Hi @felipehertzer, the tests don't pass for all versions, could you check if you can fix them? What happened to the line The code you mention is not an issue because |
Hey @adbar, The tests are currently pending a solution for the |
I'm not sure how to untangle that right now, if you have an idea feel free to test it. |
@felipehertzer I think the best way would be to isolate this code in a function and/or to call it in
|
Yes, you are right, it does make way more sense, I will change it |
Alright, I just updated it. |
Hey @adbar,
This pull request adds the
prune_xpath
feature to the Spider function. As part of this update, I've removed theprocess_response
to consolidate the pruning logic into a single function.Please let me know if you have any feedback or suggestions!
Thanks.