MILV
is a bot that parses, checks and validates internal & external URLs links in markdown files. It can be used for verification pull requests and also as standalone library.
$ go get -u -v github.com/magicmatatjahu/milv
For the above command to work you must have GoLang installed
After installation, run the program with milv
from anywhere in the file system.
If you want run the code without installation, run the following commands to get the source code, resolve external dependencies and build the project.
For this operations you must have also installed package manager Dep.
git clone https://github.com/magicmatatjahu/milv.git
cd milv
dep ensure
go build
You can use the following parameters while using milv
binary:
Name | Description | Default Value |
---|---|---|
-base-path |
root directory of repository | "" |
-config-file |
configuration file for bot. See more | milv.config.yaml |
-white-list-ext |
comma separate external links which will not be checked | [] |
-white-list-int |
comma separate internal links which will not be checked | [] |
-black-list |
comma separate files which will not be checked | [] |
-allow-redirect |
redirects will be allowed | false |
-request-repeats |
number of request repeats | 1 |
-allow-code-blocks |
checking links in code blocks | false |
-timeout |
connection timeout (in seconds) | 30 |
-ignore-external |
ignore external links | false |
-ignore-internal |
ignore internal links | false |
-v |
enable verbose logging | false |
-help or -h |
Show available parameters | n/a |
Files to be checked are given as free parameters.
- Checks all links, without matching
github.com
in external links, in.md
files in current directory+subdirectories without files matchingvendor
in path:
milv -black-list="vendor" -white-lis-ext="github.com"
- Checks links only in
./README.md
and./foo/bar.md
files:
milv ./README.md ./foo/bar.md
If you do not want to install milv
and it's dependencies you can simply use Docker and Docker image:
docker run --rm -v $PWD:/milv:ro magicmatatjahu/milv:stability -base-path=/milv
The configuration file allows for quick parameterization of the milv
works. Config file must be a .yaml
file.
Parameterization is very similar to using parameters in the CLI
. However, you can configure files, located in subdirectories relative to the configuration file, separately with different config.
If your tree of your project look like this:
├── README.md
├── LICENSE
├── main.go
├── milv.config.yaml
└── src
├── file.go
├── file_test.go
├── foo.md
└── some_dir
└── bar.md
your config file can look like this:
white-list-external: ["localhost", "abc.com"]
white-list-internal: ["LICENSE"]
black-list: ["./README.md"]
files:
- path: "./src/foo.md"
config:
white-list-external: ["github.com"]
white-list-internal: ["#contributing"]
Before run validation, milv
remove from files list ./README.md
file to check and connect global white-list-external
with file white-list-external
and white-list-external
for ./src/foo.md
file will look that:
white-list-external: ["localhost", "abc.com", "github.com"]
Similarly will be with white-list-internal
.
If you have a config file and you use a CLI
, then milv
will automatically combine the parameters from file and consol.
NOTE: For this example tree of project is the same as above.
Config file can look like this:
white-list-external: ["localhost", "abc.com"]
white-list-internal: ["LICENSE"]
black-list: ["./README.md"]
request-repeats: 5
timeout: 45
allow-redirect: false
allow-code-blocks: true
files:
- path: "./src/foo.md"
config:
white-list-external: ["google.com"]
white-list-internal: ["#contributing"]
request-repeats: 3
timeout: 30
allow-code-blocks: false
links:
- path: "https://github.com/magicmatatjahu/milv"
config:
timeout: 15
allow-redirect: true
In this example we can see that milv
will globally check external links with 45 seconds timeout, also won't allow redirect and will allow checking links in code snippets and default times of request repeats is set 5.
Milv
also allows to separately configurate files. Timeout in ./src/foo.md
file will be set to 30 seconds, links will be checking 3 times (if they will return error) and the links in code blocks won't be checked. However, a single link https://github.com/magicmatatjahu/milv
will be checking with 15 seconds timeout with the possibility of redirection.
The below table describes the types of errors during checking links and examples of how to solve them:
Error | Solution example |
---|---|
404 Not Found |
Page doesn't exist - you have to change the external link to the correct one |
Error with formatting link | Correct link or if link has a variables or it is a example, add this link to the white-list-external or white-list-internal |
The specified file doesn't exist |
Change the relative path to the file to the correct one or use a absolute path (second solution is not recommended) |
The specified header doesn't exist in file |
Change the anchor link in .md file to the correct one. Sometimes milv give a hint (Did you mean about <similar header>? ) of which header (existing in the file) is very similar to the given. |
The specified anchor doesn't exist... or The specified anchor doesn't exist in website... |
Check which anchors are on the external website and correct the specified anchor or remove the redirection to the given anchor. Sometimes milv give a hint (Did you mean about <similar anchor>? ) of which anchor (existing in the website) is very similar to the given. |
Get <external link>: net/http: request canceled (Client.Timeout exceeded while awaiting headers) |
Increase net timeout to the all files, specific file or specific link or increase times of request repeats (Here's how to do it) |
Get <external link>: EOF |
Same as above or change the link to the other one (probably website doesn't exist) |
Other types of errors and errors with contains no such host or timeout words |
Most likely, the website doesn't exist or you do not have access to it. Possible solutions: change the link to another, correct one, remove it or add it to the white-list-external or white-list-internal |
It is a good practice to add local or internal (in the local network) links to the global white list of external or internal links, such as http://localhost
.
milv
can help you validate links in all .md
files in whole repository when a pull request is created (or a commit is pushed).
To use milv
with Jenkins, connect your repo and create a Jenkinsfile
and add stage:
stage("validate internal & external links") {
workDir = pwd()
sh "docker run --rm --dns=8.8.8.8 --dns=8.8.4.4 -v $workDir:/milv:ro magicmatatjahu/milv:0.0.6 -base-path=/milv"
}
In opensource community is available other links validation libraries written in JS, Ruby and others languages. Here are a few of note:
- awesome_bot: validator written in Ruby. Allows for validation external and internal links in
.md
files. - remark-validate-links: validator written in JS. Allows for validation internal links in
.md
files.
If you want contribute this project, firstly read CONTRIBUTING.md file for details of submitting pull requests.
This project is available under the MIT license. See the LICENSE file for more info.
- error handling
- refactor (new architecture)
- documentations
- possibility to validation remote repositories hosted on GitHub
- parse other type of files
- add more commands like a: timeout for http.Get(), allow redirects or SSL
- landing page for project