Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for outputing to JSON all crawled URLs (including 200 ones) #214

Closed
cipriancraciun opened this issue Mar 9, 2022 · 0 comments · Fixed by #275
Closed

Add support for outputing to JSON all crawled URLs (including 200 ones) #214

cipriancraciun opened this issue Mar 9, 2022 · 0 comments · Fixed by #275
Labels
enhancement New feature or request

Comments

@cipriancraciun
Copy link

In addition to #38, where one wants to save the failed URLs as JSON, listing also all resources (i.e. those that return 30x or 200) could be useful.

For example one could use muffet to crawl a site in order to extract a list of all dependent resources (CSS, JS, images, etc.) and other linked-to pages.

Then one could use these URLs for other analytical purposes, or even to warmup a cache after a redeploy.

With the current format, the links JSON list could be expanded with all encountered URLs and replacing error with status to easily differentiate what was an error and what was a successful crawl.

@raviqqe raviqqe added the enhancement New feature or request label Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants