website-validator

A dotnet application that crawls a website checking for http 404s and maybe more stuff later

Intended usage:

websitevalidator -u https://www.yourdomain.whatever -c [--limit xxx] -o structure.json

Output:

A big json file with a lot of information. A part of it being the structure of the website. Useful for further analysis. Its a simple big JSON file. A good thing if you like to use powershell e.g.. Just read the thing and do whatever.

Next tasks

basic functionality

convert relative urls to absolute ones
return the output either as human readable or json (is there a generic approach?); maybe add a --human switch for the more readable output and default to json
return only distinct results
enable some basic crawling activity
- remember the result of each url, so every url is only crawled once
- only check external urls, but do not feed links from them back into the system. It is important that they are basically reachable but we do not want to check their pages, too)
also crawl resource files like linked images, css and javascript
add an option for a final human readable report?

validations

validations should be configurable without the need for a recompilation
group results by http status code, create error messages for 404s and other problems
pages shall not contain "Error", "Warning", or anything else that looks like a php problem
can I have an overview of which pages are mentioned in the sitemap and which are not
can I have an overview of pages which are possibly disallowed by robots.txt
we need something that allows us to mute known validation messages that we want to ignore

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
source/WebsiteValidator		source/WebsiteValidator
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

website-validator

Next tasks

basic functionality

validations

About

Releases

Packages

Languages

License

CleverCodeCravers/website-validator

Folders and files

Latest commit

History

Repository files navigation

website-validator

Next tasks

basic functionality

validations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages