format

yoavg · Dec 16, 2024 · a195f0d · a195f0d
1 parent 67b56a3
commit a195f0d
Showing 1 changed file with 9 additions and 5 deletions.
diff --git a/ass3/README.md b/ass3/README.md
@@ -160,9 +160,11 @@ The question, then, is how to obtain the vectors (of course, the assumption is t
 
 Here are a few options:
 
-1) **Combine static word vectors** (each text can be represented as an average or a weighted sum over the individual word vectors). You can find pre-trained Hebrew word vectors [here](https://drive.google.com/drive/folders/1qBgdcXtGjse9Kq7k1wwMzD84HH_Z8aJt).
-2) **Combine contextual word-vectors**. Again, each text is represented as an average or a weighted sum of the contextualized vectors of its individual tokens. For Hebrew, you can use existing pre-trained BERT-like models that are available in the Hugging-face `Transformers` library. Specifically you can use [alephbert-base](https://huggingface.co/onlplab/alephbert-base) or  [dictabert](https://huggingface.co/dicta-il/dictabert).[^2] 
-3) **Use a pre-trained text embedder**, that is trained specifically for encoding texts as single vectors. While no such model currently exist for Hebrew, there are some "multilingual" ones, that supposedly work with many languages, including Hebrew. In particular, the [sentence-transformers](https://sbert.net/) package has some [pre-trained multilingual models](https://sbert.net/docs/sentence_transformer/pretrained_models.html#multilingual-models), some of which support Hebrew (`he`). LLM APIs of providers like [OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=node) and [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/embeddings) have `embedding` endpoints that will produce results also for Hebrew texts. 
+1. **Combine static word vectors** (each text can be represented as an average or a weighted sum over the individual word vectors). You can find pre-trained Hebrew word vectors [here](https://drive.google.com/drive/folders/1qBgdcXtGjse9Kq7k1wwMzD84HH_Z8aJt).
+
+2. **Combine contextual word-vectors**. Again, each text is represented as an average or a weighted sum of the contextualized vectors of its individual tokens. For Hebrew, you can use existing pre-trained BERT-like models that are available in the Hugging-face `Transformers` library. Specifically you can use [alephbert-base](https://huggingface.co/onlplab/alephbert-base) or  [dictabert](https://huggingface.co/dicta-il/dictabert).[^2] 
+
+3. **Use a pre-trained text embedder**, that is trained specifically for encoding texts as single vectors. While no such model currently exist for Hebrew, there are some "multilingual" ones, that supposedly work with many languages, including Hebrew. In particular, the [sentence-transformers](https://sbert.net/) package has some [pre-trained multilingual models](https://sbert.net/docs/sentence_transformer/pretrained_models.html#multilingual-models), some of which support Hebrew (`he`). LLM APIs of providers like [OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings?lang=node) and [Anthropic](https://docs.anthropic.com/en/docs/build-with-claude/embeddings) have `embedding` endpoints that will produce results also for Hebrew texts. 
 
 Feel free to experiment! In any case, you need to produce vectors for each unit you want to index, store them in a numpy array, and save the array for future use. In retrieval time, load the array, encode the query using the same method that encoded the vectors in the array, and look for the most similar rows.
 
@@ -222,8 +224,10 @@ We do not ask you to submit code we can run, so do whatever is convenient to you
 ### Evaluation
 
 We ask you to perform two kinds of evaluation:
-1) **End-to-end evaluation**, in which you go from query to final answer. This will have to be done manually. Choose 10-20 queries and manually evaluate their answers. Beyond accuracy, see if you can find common trends, or cases where the retrieval found the correct document but the overall system produced a wrong answer, or the other way around: cases where the retrieval failed but the system overall produced an adequate answer.
-2) **Retrieval Evaluation**, here, we evaluate the system on their ability to find the correct page (ignoring the RAG part). Use two metrics: The first metric is _recall@k_ which measures how many times the correct page was within the top-k retrieved documents (after reranking, if you do them). Use k=5 and k=20. The second metric is [MRR](https://en.wikipedia.org/wiki/Mean_reciprocal_rank), which looks for the position in which you ranked the correct document for the query.
+
+1. **End-to-end evaluation**, in which you go from query to final answer. This will have to be done manually. Choose 10-20 queries and manually evaluate their answers. Beyond accuracy, see if you can find common trends, or cases where the retrieval found the correct document but the overall system produced a wrong answer, or the other way around: cases where the retrieval failed but the system overall produced an adequate answer.
+
+2. **Retrieval Evaluation**, here, we evaluate the system on their ability to find the correct page (ignoring the RAG part). Use two metrics: The first metric is _recall@k_ which measures how many times the correct page was within the top-k retrieved documents (after reranking, if you do them). Use k=5 and k=20. The second metric is [MRR](https://en.wikipedia.org/wiki/Mean_reciprocal_rank), which looks for the position in which you ranked the correct document for the query.
 
 Your report should include both the end-to-end evaluation, and also the retrieval evaluation. For the retrieval evaluation, you should report numbers for two sets of queries: 
 - Your own dev-set you created in part 1.