This Python script is a web scraper that is specifically designed to retrieve patent statuses from register.epo.org
.
- Python 3.7+
- Selenium WebDriver
- openpyxl
- Make sure Python 3 is installed. You can download it from here.
- Clone the repository:
git clone https://github.com/lancer1911/epo-patent-status-scraper.git
- Navigate to the cloned directory:
cd epo-patent-status-scraper
- Install the required Python libraries using pip:
pip install -r requirements.txt
- Download the ChromeDriver that matches your installed Chrome version and place it in the same directory as the script.
- Run the script:
python epo-status-scraper.py
- You will be prompted to select an Excel file that contains the EP patent numbers. Make sure the file and patent numbers are ready before running the script.
- You will be asked to specify the column that contains the patent numbers (e.g., 'C').
- You will be asked to specify the row number to start from (for first time use, type '2', to avoid overwriting the header).
- The script will then scrape the status of each patent and write it into a new column in the same Excel file.
- Be aware that scraping too many times in a short period may result in your IP being blocked by the website.
- Always double-check the scraped data for any anomalies.
- This script is intended for educational purposes only. Please use responsibly and ensure all actions comply with the website's terms of service.
- If the script can't find the ChromeDriver, make sure the driver is in the same directory as the script and that the path is correct.
- If the script can't open the Excel file, make sure the file is not open in another program.