refactor eval code, create eval cli #55

Ben-Epstein · 2023-09-23T15:17:48Z

there was massive overlap between the eval rag and eval retriever scripts.

I broke those out into functions in eval/utils.py so they could be reused.

Cleaned up the functions a bit, and then built the eval CLI

Tested with the following (note I had to comment out this line to work with gpt2

# install the dalm repo
pip install -e .

# train rag e2e
dalm train-rag-e2e \
"./dalm/datasets/toy_data_train.csv" \
"BAAI/bge-small-en" \
"gpt2" \
--output-dir "rag_e2e_checkpoints" \
--per-device-train-batch-size 32

# eval retriever
dalm eval-retriever "./dalm/datasets/toy_data_train.csv" \
 --retriever-name-or-path "BAAI/bge-small-en" \
 --retriever-peft-model-path "rag_e2e_checkpoints/retriever" \
 --embed-dim 384

###
Construct passage index
Evaluation start
Retriever results:
Recall: 0.10000000000000003
Precision: 1.0
Hit Rate: 1.0
*************
###

# eval rag
dalm eval-rag "./dalm/datasets/toy_data_train.csv"  \
 --retriever-name-or-path "BAAI/bge-small-en" \
 --generator-name-or-path "gpt2" \
 --retriever-peft-model-path rag_e2e_checkpoints/retriever \
 --generator-peft-model-path rag_e2e_checkpoints/generator \
 --query-batch-size 5 \
 --embed-dim 384

###
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
Retriever results:
Recall: 0.10000000000000003
Precision: 1.0
Hit Rate: 1.0
*************
Generator evaluation:
Exact match: 0.0
###

shamanez · 2023-09-24T09:15:40Z

dalm/eval/eval_rag.py

+    unique_passage_dataset, passage_embeddings_array = get_passage_embeddings(
+        processed_datasets,
+        passage_column_name,
+        rag_model.retrieval_forward,


Is this okay to pass a function? I am ok with it. Is there a better way?

it's pretty normal to pass callables typing.Callable around :)

shamanez · 2023-09-24T09:17:42Z

There are major structural changes. I am ok with the logic. But please run the eval with the given dataset and confirm we get the same results.

Ben-Epstein added 2 commits September 23, 2023 10:35

refactor eval code, create eval cli

9f2f714

cleanup rag eval a bit more

b4e0a60

Ben-Epstein requested review from metric-space and shamanez September 23, 2023 15:17

Ben-Epstein added 2 commits September 23, 2023 11:18

cleanup readme

176473a

add docstrings

6c8af0f

shamanez approved these changes Sep 24, 2023

View reviewed changes

Merge branch 'main' into feat/eval-cli

968a2d4

Ben-Epstein merged commit 2245860 into main Sep 24, 2023
1 check passed

Ben-Epstein deleted the feat/eval-cli branch September 24, 2023 12:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor eval code, create eval cli #55

refactor eval code, create eval cli #55

Ben-Epstein commented Sep 23, 2023

shamanez Sep 24, 2023

Ben-Epstein Sep 24, 2023

shamanez commented Sep 24, 2023

refactor eval code, create eval cli #55

refactor eval code, create eval cli #55

Conversation

Ben-Epstein commented Sep 23, 2023

shamanez Sep 24, 2023

Choose a reason for hiding this comment

Ben-Epstein Sep 24, 2023

Choose a reason for hiding this comment

shamanez commented Sep 24, 2023