Skip to content

Filter HTML tags in MAC WebCrawler not working #95

Closed Answered by codedbyyogesh
ldostuni asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @ldostuni , thanks for reaching out! 👍

The global element configuration option for the crawler is used to fetch page content from the html elements specified in this configuration. It does not alter the crawl behaviour (ie to filter links to crawl). The crawler will crawl all pages to the specified depth, only retrieving contents (text) from the elements specified in the configuration.

If you want to create a custom search, perhaps have a look at the other operations provided by the crawler (ie if you stitch together Generate Sitemap to get all page links to the desired depth, then iterate over these pages passing each page link to Page Insights - this gives you a list of all urls on a…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@codedbyyogesh
Comment options

Answer selected by amirkhan-ak-sf
@ldostuni
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
help wanted Extra attention is needed question Further information is requested
3 participants