GitHub - twiny/spidy: Domain names collector - Crawl websites and collect domain names along with their availability status.

Spidy

A tool that crawl websites to find domain names and checks thier availiabity.

Install

git clone https://github.com/twiny/spidy.git
cd ./spidy

# build
go build -o bin/spidy -v cmd/spidy/main.go

# run
./bin/spidy -c config/config.yaml -u https://github.com

Usage

NAME:
   Spidy - Domain name scraper

USAGE:
   spidy [global options] command [command options] [arguments...]

VERSION:
   2.0.0

COMMANDS:
   help, h  Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --config path, -c path  path to config file
   --help, -h              show help (default: false)
   --urls urls, -u urls    urls of page to scrape  (accepts multiple inputs)
   --version, -v           print the version (default: false)

Configuration

# main crawler config
crawler:
    max_depth: 10 # max depth of pages to visit per website.
    # filter: [] # regexp filter
    rate_limit: "1/5s" # 1 request per 5 sec
    max_body_size: "20MB" # max page body size
    user_agents: # array of user-agents
      - "Spidy/2.1; +https://github.com/ twiny/spidy"
    # proxies: [] # array of proxy. http(s), SOCKS5
# Logs
log:
    rotate: 7 # log rotation
    path: "./log" # log directory
# Store
store:
    ttl: "24h" # keep cache for 24h 
    path: "./store" # store directory
# Results
result:
    path: ./result # result directory
parralle: 3 # number of concurrent workers 
timeout: "5m" # request timeout
tlds: ["biz", "cc", "com", "edu", "info", "net", "org", "tv"] # array of domain extension to check.

TODO

Add support to more writers.
Add terminal logging.
Add test cases.

Issues

NOTE: This package is provided "as is" with no guarantee. Use it at your own risk and always test it yourself before using it in a production environment. If you find any issues, please create a new issue.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
cmd/spidy		cmd/spidy
config		config
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spidy

Install

Usage

Configuration

TODO

Issues

About

Releases 3

Packages

Languages

License

twiny/spidy

Folders and files

Latest commit

History

Repository files navigation

Spidy

Install

Usage

Configuration

TODO

Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages