Ads.txt Crawler
This project provides a streamlined Python script for crawling and verifying ads.txt files from specified domains. Designed with simplicity in mind, it bypasses the complexity and database dependencies of traditional crawlers, making it an ideal tool for quick checks, testing domain accessibility, and ensuring that DSPs (Demand Side Platforms) and SSPs (Supply Side Platforms) can crawl ads.txt files without firewall blocks from common data center IPs, such as AWS.
- Ease of Use: No complicated setup or database requirements.
- Firewall Check: Tests domain accessibility to ensure ads.txt files are crawlable from data center IPs.
- Simplified Output: Directly prints crawl results, facilitating quick reviews and checks.
- Configurable: Supports basic customizations including target domains and crawl concurrency.
- Python 3.x
- Requests library
- Clone this repository or download the script directly.
- Install required Python packages:
pip install requests
To use the crawler, you need a list of domains you want to crawl in a CSV file. The script can be run with the following command:
python crawler_script.py -t path/to/your/target_domains.csv
-t
,--targets
: Specify the target domains file (CSV format).-v
,--verbose
: Increase verbosity for more detailed logging.-p
,--thread_pool
: Set the number of crawling threads to use (default is 4).
Maintainer: Julian Salinas, email @ [email protected]
Your contributions are welcome! Please feel free to submit issues or pull requests with improvements or bug fixes.
This script is provided for testing and educational purposes. Users are responsible for ensuring that their use of the script complies with website terms of service, legal requirements, and ethical standards.
This project is released under the MIT License. See the LICENSE
file for more details.