The py_scraping_template repository contains a Python script for scraping data from a certain website and inserting the retrieved data into a Firestore database.
The following tools and packages are required to run the scripts in this repository.
- Docker
- Docker Compose
- Python 3.7 or higher
Clone this repository to your local machine:
Run the following command to create directories on the remote instance:
make remote-create-dir
This command creates the following directories:
/usr/local/hoge_board_scraping/logs
/usr/local/hoge_board_scraping/chrome
/usr/local/hoge_board_scraping/cred
/usr/local/hoge_board_scraping/script/images
Run the following commands to install Docker and Docker Compose on the remote instance:
make remote-install-docker
make remote-install-docker-compose
ID={Your Id}
PASSWORD={Your password}
To obtain the Firebase Admin SDK JSON file, follow these steps:
- Go to the Firebase console and select your project.
- Click on the gear icon at the top left corner and select "Project settings."
- Navigate to the "Service accounts" tab and click on "Generate new private key."
- A JSON file containing your private key will be downloaded to your computer.
- Make sure to keep the private key in a secure location and not to share it with anyone who should not have access to your Firebase project.
Run the following command to copy the necessary files to the remote instance:
make scp-all
This command copies the following files:
docker-compose.yml
python-selenium/
script/
cred/
Run the following command to rebuild and restart the Python container:
make rebuild-restart-python
To start scraping, run the following command:
make ssh
This command logs you into the remote instance. Then, run the following command to start scraping:
cd /usr/local/hoge_board_scraping/
docker-compose up -d
The docker-compose.yml
file is used to start the Selenium Grid and Python containers.
This file specifies the following services:
- selenium-hub: Selenium Grid hub
- chrome: Selenium Chrome node
- python: Python container that runs the scraping script
The python
container is built using the python-selenium
directory, which contains the necessary packages and dependencies to run the scraping script.
The script
directory contains the Python script for scraping the