Skip to content

Gathers Urls from crawling or a file, then uses html-validator to validate each page.

Notifications You must be signed in to change notification settings

GuideToIceland/site-validator-cli

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status Coverage Status js-standard-style

site-validator-cli

A command line tool that takes in a URL or a file, then uses html-validator (a wrapper for https://validator.w3.org/nu/) to validate each page.

Installation

Get Node.js, then

$ npm i site-validator-cli -g

Usage

$ site-validator <url> [options]

This takes in the URL and will generate the entire sitemap, then tries to validate each page in the sitemap

$ site-validator <path-to-file> [options]

This takes in a file accepting one of the following formats: .json/.xml/.txt, then tries to validate each page from the file. supports both local and online files. (File Content Guidelines)

$ site-validator [options] --url <url>
                           --path <path-to-file>

If it's more convenient, you can also put the url/path at the end, but you have to prepend with --url or --path.

Options

Flag Description
--verbose This flag will pretty-print out the errors/warnings.
Without it, it'll only say whether page validated.
--quiet This flag will ignore warnings or informational messages.
--local This expects the url to be a localhost url
(e.g. http://localhost), if the site is not served on port 80, you have to specify the port number (e.g. http://localhost:3000). localhost sites served over HTTPS is not currently supported.
--cache <min> By default, the sitemap generated will be cached for 60 minutes. Use this flag to change how long you want to cache the sitemap for.
--clear-cache $ site-validator --clear-cache clears all cached sitemaps.
If you want to refetch and recache sitemap for a url:
$ site-validator <url> --clear-cache
--output <filename> Outputs a json file in the current directory.
Filename optional, defaults to ISO format current time
Output Schema
--view <filename> Prints report from output json file (without .json) to console.
$ site-validator <filename> --view
$ site-validator --view <filename>
both works.
--page This validates the URL passed in without crawling.
--ff (Fail Fast) This flag will stop the checking at the first error.
(Note: does not work with --output)

Other Commands

Help

$ site-validator -h, help, --help

Version

$ site-validator -v, version, --version

File Content Guidelines

File - json

$ site-validator <path-to-json-file>

Expects a json-file with an array of URLs and tries to validate each page found in the array

[
  "https://example.com/",
  "https://example.com/about",
  "https://example.com/projects"
]

File - txt

$ site-validator <path-to-txt-file>

Expects a txt-file with 1 URL on each line and tries to validate each page found in the file

https://example.com/
https://example.com/about
https://example.com/projects

File - xml

$ site-validator <path-to-xml-file>

Expects a xml-file with the following format

<?xml version="1.0" encoding="UTF-8"?>
<urlset>
  <url>
    <loc>https://example.com/</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
  <url>
    <loc>https://example.com/projects</loc>
  </url>
</urlset>

Output Schema

{
  url: "url-entered",
  pages: [
    "crawled-page-1",
    "crawled-page-2",
    "crawled-page-3",
    //...
  ],
  quiet: "boolean",
  singlePage: "boolean",
  passed: "boolean",
  results: {
    passed: [
      {
        url: "crawled-page-pass",
        status: "pass",
        errors: []
      },
      //...
    ],
    failed: [
      // may contain the following types
      {
        url: "crawled-page-fail",
        status: "fail",
        errors: [
          {
            type: "error-type",
            message: "error-message",
            location: "error-location"  
          },
          //...
        ]
      },
      {
        url: "crawled-page-not-found",
        status: "not found",
        errors: []
      },
      {
        url: "crawled-page-error",
        status: "error",
        errors: [
          "error message"
        ]
      },
      //...
    ]
  }
}

Contributors

p1ho zrrrzzt

Acknowledgement

Inspired by w3c-vadlidator-cli (outdated)

About

Gathers Urls from crawling or a file, then uses html-validator to validate each page.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%