Skip to content

Commit

Permalink
Update evaluate ragas component as reusable component
Browse files Browse the repository at this point in the history
  • Loading branch information
RobbeSneyders committed Dec 11, 2023
1 parent 08fd581 commit c7a9e2e
Show file tree
Hide file tree
Showing 45 changed files with 525 additions and 23 deletions.
Binary file added .coverage.robbe-XPS-13-9305.45948.640622
Binary file not shown.
3 changes: 3 additions & 0 deletions .fondant/compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
name: testpipeline
services: {}
version: '3.8'
1 change: 1 addition & 0 deletions .fondant/sagemaker-pipeline.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
foo: bar
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,20 @@ RUN apt-get update && \
COPY requirements.txt /
RUN pip3 install --no-cache-dir -r requirements.txt

# Set the working directory to the component folder
WORKDIR /component/src
# Install Fondant
# This is split from other requirements to leverage caching
ARG FONDANT_VERSION=main
RUN pip3 install fondant[component,aws,azure,gcp]@git+https://github.com/ml6team/fondant@${FONDANT_VERSION}

# Copy over src-files
COPY src/ .
# Set the working directory to the component folder
WORKDIR /component
COPY src/ src/

FROM base as test
COPY tests/ tests/
RUN pip3 install --no-cache-dir -r tests/requirements.txt
ARG OPENAI_KEY
ENV OPENAI_KEY=${OPENAI_KEY}
RUN python -m pytest tests

FROM base
Expand Down
55 changes: 55 additions & 0 deletions components/evaluate_ragas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# retriever_eval_ragas

### Description
Component that evaluates the retriever using RAGAS

### Inputs / outputs

**This component consumes:**

- text: string
- retrieved_chunks: list<item: string>

**This component produces no data.**

### Arguments

The component takes the following arguments to alter its behavior:

| argument | type | description | default |
| -------- | ---- | ----------- | ------- |
| module | str | Module from which the LLM is imported. Defaults to langchain.llms | langchain.llms |
| llm_name | str | Name of the selected llm | / |
| llm_kwargs | dict | Arguments of the selected llm | / |
| metrics | list | RAGAS metrics to compute | / |

### Usage

You can add this component to your pipeline using the following code:

```python
from fondant.pipeline import Pipeline


pipeline = Pipeline(...)

dataset = pipeline.read(...)

dataset = dataset.apply(
"evaluate_ragas",
arguments={
# Add arguments
# "module": "langchain.llms",
# "llm_name": ,
# "llm_kwargs": {},
# "metrics": [],
}
)
```

### Testing

You can run the tests using docker with BuildKit. From this directory, run:
```
docker build . --target test
```
Original file line number Diff line number Diff line change
@@ -1,24 +1,22 @@
#metadata: to be matched w/ docker image
name: retriever_eval_ragas
description: Component that evaluates the retriever using RAGAS
image: ghcr.io/ml6team/retriever_eval:dev
image: fndnt/retriever_eval:dev
tags:
- Data writing
- Text processing

consumes:
text: #TODO: same as previous component produces
text:
type: string
retrieved_chunks:
type: array
items:
type: string

produces:
#TODO: add/retrieve chosen metrics to compute
context_precision:
type: float32
context_relevancy:
type: float32
additionalProperties: true
# Overwrite with metrics to be computed by ragas
# (https://docs.ragas.io/en/latest/concepts/metrics/index.html)


args:
module:
Expand Down
1 change: 1 addition & 0 deletions components/evaluate_ragas/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ragas==0.0.21
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import typing as t

import pandas as pd
from datasets import Dataset
from fondant.component import PandasTransformComponent
Expand All @@ -12,15 +14,15 @@ def __init__(
module: str,
llm_name: str,
llm_kwargs: dict,
metrics: list,
produces: t.Dict[str, t.Any],
**kwargs,
) -> None:
"""
Args:
module: Module from which the LLM is imported. Defaults to langchain.llms
llm_name: Name of the selected llm
llm_kwargs: Arguments of the selected llm
metrics: RAGAS metrics to compute.
produces: RAGAS metrics to compute.
kwargs: Unhandled keyword arguments passed in by Fondant.
"""
self.llm = self.extract_llm(
Expand All @@ -29,7 +31,9 @@ def __init__(
model_kwargs=llm_kwargs,
)
self.gpt_wrapper = LangchainLLM(llm=self.llm)
self.metric_functions = self.extract_metric_functions(metrics=metrics)
self.metric_functions = self.extract_metric_functions(
metrics=list(produces.keys()),
)
self.set_llm(self.metric_functions)

# import the metric functions selected
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
import os

import pandas as pd
import pyarrow as pa
from main import RetrieverEval


Expand Down Expand Up @@ -49,8 +52,11 @@ def test_transform():
component = RetrieverEval(
module="langchain.llms",
llm_name="OpenAI",
llm_kwargs={"openai_api_key": ""},
metrics=["context_precision", "context_relevancy"],
llm_kwargs={"openai_api_key": os.environ["OPENAI_KEY"]},
produces={
"context_precision": pa.float32(),
"context_relevancy": pa.float32(),
},
)

output_dataframe = component.transform(input_dataframe)
Expand Down
File renamed without changes.
5 changes: 0 additions & 5 deletions components/retriever_eval_ragas/requirements.txt

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/data/creative_commons_pipline/creative_commons_pipline-20231206224045/filter_languages/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/data/creative_commons_pipline/creative_commons_pipline-20231206224045/download_images/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/data/creative_commons_pipline/creative_commons_pipline-20231206224045/load_from_hub/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"metadata": {
"base_path": "/data",
"pipeline_name": "creative_commons_pipline",
"run_id": "creative_commons_pipline-20231206223031",
"component_id": "download_images",
"cache_key": "f566da7d30e5bc64cec6144cc2a57c27"
},
"index": {
"location": "/creative_commons_pipline-20231206223031/download_images"
},
"fields": {
"alt_text": {
"location": "/creative_commons_pipline-20231206223031/load_from_hub",
"type": "string"
},
"image_url": {
"location": "/creative_commons_pipline-20231206223031/load_from_hub",
"type": "string"
},
"license_location": {
"location": "/creative_commons_pipline-20231206223031/load_from_hub",
"type": "string"
},
"license_type": {
"location": "/creative_commons_pipline-20231206223031/load_from_hub",
"type": "string"
},
"webpage_url": {
"location": "/creative_commons_pipline-20231206223031/load_from_hub",
"type": "string"
},
"image": {
"location": "/creative_commons_pipline-20231206223031/download_images",
"type": "binary"
},
"image_width": {
"location": "/creative_commons_pipline-20231206223031/download_images",
"type": "int32"
},
"image_height": {
"location": "/creative_commons_pipline-20231206223031/download_images",
"type": "int32"
}
}
}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"metadata": {"base_path": "/data", "pipeline_name": "creative_commons_pipline", "run_id": "creative_commons_pipline-20231206223031", "component_id": "filter_languages", "cache_key": "f566da7d30e5bc64cec6144cc2a57c27"}, "index": {"location": "/creative_commons_pipline-20231206223031/filter_languages"}, "fields": {"alt_text": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_location": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_type": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "webpage_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "binary"}, "image_width": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "int32"}, "image_height": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "int32"}}}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"metadata": {"base_path": "/data", "pipeline_name": "creative_commons_pipline", "run_id": "creative_commons_pipline-20231206223031", "component_id": "load_from_hub", "cache_key": "f566da7d30e5bc64cec6144cc2a57c27"}, "index": {"location": "/creative_commons_pipline-20231206223031/load_from_hub"}, "fields": {"alt_text": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_location": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_type": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "webpage_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}}}
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"metadata": {"base_path": "/data", "pipeline_name": "creative_commons_pipline", "run_id": "creative_commons_pipline-20231206223031", "component_id": "download_images", "cache_key": "f566da7d30e5bc64cec6144cc2a57c27"}, "index": {"location": "/creative_commons_pipline-20231206223031/download_images"}, "fields": {"alt_text": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_location": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_type": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "webpage_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "binary"}, "image_width": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "int32"}, "image_height": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "int32"}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"metadata": {"base_path": "/data", "pipeline_name": "creative_commons_pipline", "run_id": "creative_commons_pipline-20231206224045", "component_id": "filter_languages", "cache_key": "f566da7d30e5bc64cec6144cc2a57c27"}, "index": {"location": "/creative_commons_pipline-20231206224045/filter_languages"}, "fields": {"alt_text": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_location": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_type": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "webpage_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "binary"}, "image_width": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "int32"}, "image_height": {"location": "/creative_commons_pipline-20231206223031/download_images", "type": "int32"}}}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"metadata": {"base_path": "/data", "pipeline_name": "creative_commons_pipline", "run_id": "creative_commons_pipline-20231206223031", "component_id": "load_from_hub", "cache_key": "f566da7d30e5bc64cec6144cc2a57c27"}, "index": {"location": "/creative_commons_pipline-20231206223031/load_from_hub"}, "fields": {"alt_text": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "image_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_location": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "license_type": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}, "webpage_url": {"location": "/creative_commons_pipline-20231206223031/load_from_hub", "type": "string"}}}
Binary file added docs/art/design/pipeline_component.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/art/design/pipeline_main.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/art/design/pipeline_manifest.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit c7a9e2e

Please sign in to comment.