-
Notifications
You must be signed in to change notification settings - Fork 26
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Putting it as a draft because will probably change the base Dockerfile for one that already has torch dependencies, instead of installing them in the requirements.txt --------- Co-authored-by: Robbe Sneyders <[email protected]>
- Loading branch information
1 parent
38163c9
commit 589e327
Showing
8 changed files
with
316 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
FROM --platform=linux/amd64 python:3.8-slim as base | ||
|
||
# System dependencies | ||
RUN apt-get update && \ | ||
apt-get upgrade -y && \ | ||
apt-get install git -y | ||
|
||
# Install requirements | ||
COPY requirements.txt / | ||
RUN pip3 install --no-cache-dir -r requirements.txt | ||
|
||
# Install Fondant | ||
# This is split from other requirements to leverage caching | ||
ARG FONDANT_VERSION=main | ||
RUN pip3 install fondant[component,aws,azure,gcp]@git+https://github.com/ml6team/fondant@${FONDANT_VERSION} | ||
|
||
# Set the working directory to the component folder | ||
WORKDIR /component | ||
COPY src/ src/ | ||
|
||
FROM base as test | ||
COPY tests/ tests/ | ||
RUN pip3 install --no-cache-dir -r tests/requirements.txt | ||
ARG OPENAI_KEY | ||
ENV OPENAI_KEY=${OPENAI_KEY} | ||
RUN python -m pytest tests | ||
|
||
FROM base | ||
WORKDIR /component/src | ||
ENTRYPOINT ["fondant", "execute", "main"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# retriever_eval_ragas | ||
|
||
### Description | ||
Component that evaluates the retriever using RAGAS | ||
|
||
### Inputs / outputs | ||
|
||
**This component consumes:** | ||
|
||
- text: string | ||
- retrieved_chunks: list<item: string> | ||
|
||
**This component produces no data.** | ||
|
||
### Arguments | ||
|
||
The component takes the following arguments to alter its behavior: | ||
|
||
| argument | type | description | default | | ||
| -------- | ---- | ----------- | ------- | | ||
| module | str | Module from which the LLM is imported. Defaults to langchain.llms | langchain.llms | | ||
| llm_name | str | Name of the selected llm | / | | ||
| llm_kwargs | dict | Arguments of the selected llm | / | | ||
|
||
### Usage | ||
|
||
You can add this component to your pipeline using the following code: | ||
|
||
```python | ||
from fondant.pipeline import Pipeline | ||
|
||
|
||
pipeline = Pipeline(...) | ||
|
||
dataset = pipeline.read(...) | ||
|
||
dataset = dataset.apply( | ||
"evaluate_ragas", | ||
arguments={ | ||
# Add arguments | ||
# "module": "langchain.llms", | ||
# "llm_name": , | ||
# "llm_kwargs": {}, | ||
} | ||
) | ||
``` | ||
|
||
### Testing | ||
|
||
You can run the tests using docker with BuildKit. From this directory, run: | ||
``` | ||
docker build . --target test | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
name: retriever_eval_ragas | ||
description: Component that evaluates the retriever using RAGAS | ||
image: fndnt/retriever_eval:dev | ||
tags: | ||
- Text processing | ||
|
||
consumes: | ||
text: | ||
type: string | ||
retrieved_chunks: | ||
type: array | ||
items: | ||
type: string | ||
|
||
produces: | ||
additionalProperties: true | ||
# Overwrite with metrics to be computed by ragas | ||
# (https://docs.ragas.io/en/latest/concepts/metrics/index.html) | ||
|
||
|
||
args: | ||
module: | ||
description: Module from which the LLM is imported. Defaults to langchain.llms | ||
type: str | ||
default: "langchain.llms" | ||
llm_name: | ||
description: Name of the selected llm | ||
type: str | ||
llm_kwargs: | ||
description: Arguments of the selected llm | ||
type: dict |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
ragas==0.0.21 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
import typing as t | ||
|
||
import pandas as pd | ||
from datasets import Dataset | ||
from fondant.component import PandasTransformComponent | ||
from ragas import evaluate | ||
from ragas.llms import LangchainLLM | ||
|
||
|
||
class RetrieverEval(PandasTransformComponent): | ||
def __init__( | ||
self, | ||
*, | ||
module: str, | ||
llm_name: str, | ||
llm_kwargs: dict, | ||
produces: t.Dict[str, t.Any], | ||
**kwargs, | ||
) -> None: | ||
""" | ||
Args: | ||
module: Module from which the LLM is imported. Defaults to langchain.llms | ||
llm_name: Name of the selected llm | ||
llm_kwargs: Arguments of the selected llm | ||
produces: RAGAS metrics to compute. | ||
kwargs: Unhandled keyword arguments passed in by Fondant. | ||
""" | ||
self.llm = self.extract_llm( | ||
module=module, | ||
model_name=llm_name, | ||
model_kwargs=llm_kwargs, | ||
) | ||
self.gpt_wrapper = LangchainLLM(llm=self.llm) | ||
self.metric_functions = self.extract_metric_functions( | ||
metrics=list(produces.keys()), | ||
) | ||
self.set_llm(self.metric_functions) | ||
|
||
# import the metric functions selected | ||
@staticmethod | ||
def import_from(module, name): | ||
module = __import__(module, fromlist=[name]) | ||
return getattr(module, name) | ||
|
||
def extract_llm(self, module, model_name, model_kwargs): | ||
module = self.import_from(module, model_name) | ||
return module(**model_kwargs) | ||
|
||
def extract_metric_functions(self, metrics: list): | ||
functions = [] | ||
for metric in metrics: | ||
functions.append(self.import_from("ragas.metrics", metric)) | ||
return functions | ||
|
||
def set_llm(self, metric_functions: list): | ||
for metric_function in metric_functions: | ||
metric_function.llm = self.gpt_wrapper | ||
|
||
# evaluate the retriever | ||
@staticmethod | ||
def create_hf_ds(dataframe: pd.DataFrame): | ||
dataframe = dataframe.rename( | ||
columns={"text": "question", "retrieved_chunks": "contexts"}, | ||
) | ||
return Dataset.from_pandas(dataframe) | ||
|
||
def ragas_eval(self, dataset): | ||
return evaluate(dataset=dataset, metrics=self.metric_functions) | ||
|
||
def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame: | ||
hf_dataset = self.create_hf_ds( | ||
dataframe=dataframe[["text", "retrieved_chunks"]], | ||
) | ||
if "id" in hf_dataset.column_names: | ||
hf_dataset = hf_dataset.remove_columns("id") | ||
|
||
result = self.ragas_eval(dataset=hf_dataset) | ||
results_df = result.to_pandas() | ||
results_df = results_df.set_index(dataframe.index) | ||
|
||
return results_df |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
import os | ||
|
||
import pandas as pd | ||
import pyarrow as pa | ||
from main import RetrieverEval | ||
|
||
|
||
def test_transform(): | ||
input_dataframe = pd.DataFrame( | ||
{ | ||
"text": [ | ||
"Lorem ipsum dolor sit amet, consectetur adipiscing elit?", | ||
"Sed massa massa, interdum a porttitor sit amet, semper eget nunc?", | ||
], | ||
"retrieved_chunks": [ | ||
[ | ||
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. \ | ||
Quisque ut efficitur neque. Aenean mollis eleifend est, \ | ||
eu laoreet magna egestas quis. Cras id sagittis erat. \ | ||
Aliquam vel blandit arcu. Morbi ac nulla ullamcorper, \ | ||
rutrum neque nec, pellentesque diam. Nulla nec tempor \ | ||
enim. Suspendisse a volutpat leo, quis varius dolor.", | ||
"Curabitur placerat ultrices mauris et lobortis. Maecenas \ | ||
laoreet tristique sagittis. Integer facilisis eleifend \ | ||
dolor, quis fringilla orci eleifend ac. Vestibulum nunc \ | ||
odio, tincidunt ut augue et, ornare vehicula sapien. Orci \ | ||
varius natoque penatibus et magnis dis parturient montes, \ | ||
nascetur ridiculus mus. Sed auctor felis lacus, rutrum \ | ||
tempus ligula viverra ac. Curabitur pharetra mauris et \ | ||
ornare pulvinar. Suspendisse a ultricies nisl. Mauris \ | ||
sit amet odio condimentum, venenatis orci vitae, \ | ||
tincidunt purus. Ut ullamcorper convallis ligula ac \ | ||
posuere. In efficitur enim ac lacus dignissim congue. \ | ||
Nam turpis augue, aliquam et velit sit amet, varius \ | ||
euismod ante. Duis volutpat nisl sit amet auctor tempus.\ | ||
Vivamus in eros ex.", | ||
], | ||
[ | ||
"am leo massa, ultricies eu viverra ac, commodo non sapien. \ | ||
Mauris et mauris sollicitudin, ultricies ex ac, luctus \ | ||
nulla.", | ||
"Cras tincidunt facilisis mi, ac eleifend justo lobortis ut. \ | ||
In lobortis cursus ante et faucibus. Vestibulum auctor \ | ||
felis at odio varius, ac vulputate leo dictum. \ | ||
Phasellus in augue ante. Aliquam aliquam mauris \ | ||
sed tellus egestas fermentum.", | ||
], | ||
], | ||
}, | ||
) | ||
|
||
component = RetrieverEval( | ||
module="langchain.llms", | ||
llm_name="OpenAI", | ||
llm_kwargs={"openai_api_key": os.environ["OPENAI_KEY"]}, | ||
produces={ | ||
"context_precision": pa.float32(), | ||
"context_relevancy": pa.float32(), | ||
}, | ||
) | ||
|
||
output_dataframe = component.transform(input_dataframe) | ||
|
||
expected_output_dataframe = pd.DataFrame( | ||
{ | ||
"question": [ | ||
"Lorem ipsum dolor sit amet, consectetur adipiscing elit?", | ||
"Sed massa massa, interdum a porttitor sit amet, semper eget nunc?", | ||
], | ||
"contexts": [ | ||
[ | ||
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. \ | ||
Quisque ut efficitur neque. Aenean mollis eleifend est, \ | ||
eu laoreet magna egestas quis. Cras id sagittis erat. \ | ||
Aliquam vel blandit arcu. Morbi ac nulla ullamcorper, \ | ||
rutrum neque nec, pellentesque diam. Nulla nec tempor \ | ||
enim. Suspendisse a volutpat leo, quis varius dolor.", | ||
"Curabitur placerat ultrices mauris et lobortis. Maecenas \ | ||
laoreet tristique sagittis. Integer facilisis eleifend \ | ||
dolor, quis fringilla orci eleifend ac. Vestibulum nunc \ | ||
odio, tincidunt ut augue et, ornare vehicula sapien. Orci \ | ||
varius natoque penatibus et magnis dis parturient montes, \ | ||
nascetur ridiculus mus. Sed auctor felis lacus, rutrum \ | ||
tempus ligula viverra ac. Curabitur pharetra mauris et \ | ||
ornare pulvinar. Suspendisse a ultricies nisl. Mauris \ | ||
sit amet odio condimentum, venenatis orci vitae, \ | ||
tincidunt purus. Ut ullamcorper convallis ligula ac \ | ||
posuere. In efficitur enim ac lacus dignissim congue. \ | ||
Nam turpis augue, aliquam et velit sit amet, varius \ | ||
euismod ante. Duis volutpat nisl sit amet auctor tempus.\ | ||
Vivamus in eros ex.", | ||
], | ||
[ | ||
"am leo massa, ultricies eu viverra ac, commodo non sapien. \ | ||
Mauris et mauris sollicitudin, ultricies ex ac, luctus \ | ||
nulla.", | ||
"Cras tincidunt facilisis mi, ac eleifend justo lobortis ut. \ | ||
In lobortis cursus ante et faucibus. Vestibulum auctor \ | ||
felis at odio varius, ac vulputate leo dictum. \ | ||
Phasellus in augue ante. Aliquam aliquam mauris \ | ||
sed tellus egestas fermentum.", | ||
], | ||
], | ||
"context_precision": 0.15, | ||
"context_relevancy": 0.35, | ||
}, | ||
) | ||
|
||
# Check if columns are the same | ||
columns_equal = expected_output_dataframe.columns.equals(output_dataframe.columns) | ||
|
||
# Check if data types within each column match | ||
dtypes_match = expected_output_dataframe.dtypes.equals(output_dataframe.dtypes) | ||
|
||
# Check if both conditions are met | ||
assert columns_equal | ||
assert dtypes_match |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
[pytest] | ||
pythonpath = ../src |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
pytest==7.4.2 |