This project demonstrates a refactoring of a small project that fetches a random Pokémon from the Pokémon API and stores it in a PostgreSQL database. The primary goal is to showcase how orchestration techniques can enhance simplicity and scalability using Apache Airflow and Astro CLI. By implementing orchestration, we ensure seamless, automated handling of data fetching, transformation, and loading processes.
- dags: Contains the main DAG script (
pokemon_dag.py
) that orchestrates the ETL process. - include/sql: Contains SQL scripts for database table creation and any necessary database operations.
- plugins: Custom operators, hooks, or sensors that extend Airflow functionality.
- Dockerfile: Specifies the Docker environment setup, provided by Astro CLI.
- requirements.txt: Lists required Python packages for Airflow.
- Apache Airflow: Workflow orchestration and scheduling.
- Astro CLI: Simplifies Airflow project setup and management.
- Pokémon API: Provides random Pokémon data.
- PostgreSQL: Database to store Pokémon data.
- Docker installed and running.
- Astro CLI installed.
- Access to a PostgreSQL instance.
-
Clone the repository:
mkdir airflow-astro-project cd airflow-astro-project git clone [email protected]:caio-moliveira/airflow-astro-project.git
-
Configure .env
POSTGRES_USER=POSTGRES_USER
POSTGRES_PASSWORD=POSTGRES_PASSWORD
POSTGRES_HOST=POSTGRES_HOST
POSTGRES_DB=POSTGRES_DB
AIRFLOW__CORE__ALLOWED_DESERIALIZATION_CLASSES=include.schema.PokemonSchema
- Start Astro CLI:
astro dev start
The DAG, (pokemon_dag.py
), follows these steps:
- Fetch Pokémon:
- Requests a random Pokémon from the Pokémon
API
.
- Requests a random Pokémon from the Pokémon
- Transform Data:
- Parses and structures data for storage
- Load to PostgreSQL:
- Inserts the structured data into the
PostgreSQL
database.
- Inserts the structured data into the
This project successfully demonstrates the use of Astro CLI for managing and deploying an Airflow DAG that interacts with an external API and stores data in PostgreSQL. This setup exemplifies how orchestration can be added to an existing project with minimal changes, using Astro CLI as a tool for rapid Airflow development.