Skip to content

Commit

Permalink
Move integration test into examples (#756)
Browse files Browse the repository at this point in the history
- moved integration test into examples `folder`
- adjust sample implementation to new interface and how a user would do
it
- added script to execute the pipeline using fondant cli
- add execution to ci/cd pipeline 

For now I've used a bash script for the execution. If we want to add
more integration test we should think about a different approach.

Fixes #727
  • Loading branch information
mrchtr authored Jan 11, 2024
1 parent 16888c8 commit f961b3d
Show file tree
Hide file tree
Showing 16 changed files with 151 additions and 114 deletions.
20 changes: 20 additions & 0 deletions .github/workflows/pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,26 @@ jobs:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COVERALLS_FLAG_NAME: test-${{ matrix.python-version }}

integration-test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ '3.8', '3.9', '3.10' ]
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install --upgrade pip
pip install poetry==1.4.0
poetry install --all-extras --with test
- name: Execute sample pipeline
run: ./scripts/run_integration_tests.sh $GITHUB_SHA

finish-coveralls:
needs: test
runs-on: ubuntu-latest
Expand Down
2 changes: 1 addition & 1 deletion docs/runners/local.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ about this in the [installation](../guides/installation.md) guide.
fondant run local <pipeline_ref> --auth-azure
```

You can also use the `--extra_volumes` argument to mount extra credentials or additional files.
You can also use the `--extra-volumes` argument to mount extra credentials or additional files.
This volumes will be mounted to every component/service of the docker-compose spec.


Expand Down
13 changes: 13 additions & 0 deletions examples/sample_pipeline/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Sample pipeline

This example is a simple sample pipeline which uses two reusable components
(load_from_parquet, chunk_text), and a custom dummy component. The custom dummy component only
returns the received dataframe.

The pipeline can be executed with the Fondant cli:

```bash
fondant run local pipeline.py
```

The automated integration test will use the `run.sh` script.
Original file line number Diff line number Diff line change
@@ -1,24 +1,22 @@
FROM --platform=linux/amd64 python:3.8-slim as base
FROM --platform=linux/amd64 python:3.8-slim

# System dependencies
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install git -y

# Install requirements
COPY requirements.txt /
COPY requirements.txt ./
RUN pip3 install --no-cache-dir -r requirements.txt

# Install Fondant
# This is split from other requirements to leverage caching
# Install fondant
ARG FONDANT_VERSION=main
RUN pip3 install fondant[component,aws,azure,gcp]@git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Set the working directory to the component folder
WORKDIR /component
COPY src/ src/
ENV PYTHONPATH "${PYTHONPATH}:./src"

FROM base
WORKDIR /component/src

# Copy over src-files and spec of the component
COPY src/ .

ENTRYPOINT ["fondant", "execute", "main"]
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
name: Dummy component
description: Dummy component for testing custom components

image: fndnt/dummy_component:dev
image: dummy_component

consumes:
text_data:
Expand Down
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,10 @@
class DummyComponent(PandasTransformComponent):
"""Dummy component that returns the dataframe as it is."""

def __init__(self, *_):
def __init__(self, *_, **kwargs):
pass

def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
"""Dummy component that returns the dataframe as it is."""
# raise RuntimeError
return dataframe
37 changes: 37 additions & 0 deletions examples/sample_pipeline/pipeline.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# This file contains a sample pipeline. Loading data from a parquet file,
# using the load_from_parquet component, chain a custom dummy component, and use
# the reusable chunking component
import pyarrow as pa
from pathlib import Path
from fondant.pipeline import Pipeline

BASE_PATH = Path("./.artifacts").resolve()
BASE_PATH.mkdir(parents=True, exist_ok=True)

# Define pipeline
pipeline = Pipeline(name="dummy-pipeline", base_path=str(BASE_PATH))

# Load from hub component
load_component_column_mapping = {
"text": "text_data",
}

dataset = pipeline.read(
name_or_path="load_from_parquet",
arguments={
"dataset_uri": "/data/sample.parquet",
"column_name_mapping": load_component_column_mapping,
"n_rows_to_load": 5,
},
produces={"text_data": pa.string()},
)

dataset = dataset.apply(
name_or_path="./components/dummy_component",
)

dataset.apply(
name_or_path="chunk_text",
arguments={"chunk_size": 10, "chunk_overlap": 2},
consumes={"text": "text_data"},
)
32 changes: 32 additions & 0 deletions examples/sample_pipeline/run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash
# This script executes the sample pipeline in the example folder, checks the correct execution and
# cleans up the directory again
set -e
GIT_HASH=$1


# Setup teardown
cleanup() {
rv=$?

# Try to remove .artifact folder
artifact_directory="./.artifacts"

if [ -d "$artifact_directory" ]; then
# Directory exists, remove it
# Can't delete files in cicd pipeline due to missing permissions. Not necessarily needed there,
# but might be useful if you executing the script locally.
rm -rf "$artifact_directory" 2>/dev/null || true
fi

exit $rv
}

trap cleanup EXIT

# Bind local data directory to pipeline
data_dir=$(readlink -f "data")

# Run pipeline
poetry run fondant run local pipeline.py \
--extra-volumes $data_dir:/data --build-arg FONDANT_VERSION=$GIT_HASH
36 changes: 36 additions & 0 deletions scripts/run_integration_tests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
# This script executes the sample pipeline in the example folder, checks the correct execution and
# cleans up the directory again
GIT_HASH=$1

echo "Start integration tests execution ..."

failed_tests=()

# Find all run.sh scripts and execute them
for test_script in ./examples/*/run.sh; do
test_name=$(basename "$(dirname "$test_script")")

echo "Running test: $test_name"

# Set working dir to the currect integration test
cd $(dirname "$test_script")

# Execute the run.sh script
bash ./run.sh $GIT_HASH

# Check the exit status
if [ $? -ne 0 ]; then
echo "Test $test_name failed!"
failed_tests+=("$test_name")
fi
done

echo "Tests completed"

if [ ${#failed_tests[@]} -eq 0 ]; then
echo "All tests passed!"
else
echo "Failed tests: ${failed_tests[@]}"
exit 1 # Indicate failure to cicd
fi
3 changes: 2 additions & 1 deletion src/fondant/pipeline/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -472,7 +472,8 @@ def _validate_pipeline_definition(self, run_id: str):
msg = (
f"Component '{component_op.name}' is trying to invoke the field "
f"'{component_field_name}', which has not been defined or created "
f"in the previous components."
f"in the previous components. \n"
f"Available field names: {list(manifest.fields.keys())}"
)
raise InvalidPipelineDefinition(
msg,
Expand Down

This file was deleted.

This file was deleted.

76 changes: 0 additions & 76 deletions tests/integration_tests/test_sample_pipeline.py

This file was deleted.

0 comments on commit f961b3d

Please sign in to comment.