Skip to content
This repository has been archived by the owner on Nov 6, 2023. It is now read-only.

opendatadiscovery/odd-collector-aws

Repository files navigation

forthebadge forthebadge

odd-collector-aws

ODD Collector is a lightweight service which gathers metadata from all your data sources.

To learn more about collector types and ODD Platform's architecture, read the documentation.

Preview

Implemented adapters

Service Config example
Athena config
DynamoDB config
Glue config
Kinesis config
Quicksight config
S3 config
S3_Delta config
Sagemaker config
SQS config
SagemakerFeaturestore config

Building

docker build .

Docker compose example

Due to the Plugin is inherited from pydantic.BaseSetting, each field missed in collector-config.yaml can be taken from env variables.

Custom .env file for docker-compose.yaml

AWS_REGION=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
PLATFORM_HOST_URL=http://odd-platform:8080

Custom collector-config.yaml

platform_host_url: http://localhost:8080
default_pulling_interval: 10 # Can be omitted to run collector once
token: "" # Token that must be retrieved from the platform
plugins:
  - type: s3
    name: s3_adapter
    aws_secret_access_key: <aws_secret_access_key> # Optional.
    aws_access_key_id: <aws_access_key_id> # Optional.
    aws_session_token: <aws_session_token> # Optional.
    aws_region: <aws_region> # Optional.
    datasets:
      # Recursive fetch for all objects in the bucket.
      - bucket: my_bucket
      # Explicitly specify the prefix to file.
      - bucket: my_other_bucket
        prefix: folder/subfolder/file.csv

docker-compose.yaml

version: "3.8"
services:
  # --- ODD Platform ---
  database:
    ...

  odd-platform:
    ...
  
  odd-collector-aws:
    image: 'ghcr.io/opendatadiscovery/odd-collector-aws:latest'
    restart: always
    volumes:
      - collector_config.yaml:/app/collector_config.yaml
    environment:
      - AWS_REGION=${AWS_REGION}
      - AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
      - AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}
      - PLATFORM_HOST_URL=${PLATFORM_HOST_URL}
      - LOGLEVEL='DEBUG'
    depends_on:
      - odd-platform