CrossEncoder .rank condition error in CrossEncoder.py #3124

saeeddhqan · 2024-12-09T12:40:32Z

I get the following error when I use .rank method:

File /usr/local/lib/python3.12/dist-packages/sentence_transformers/cross_encoder/CrossEncoder.py:551, in CrossEncoder.rank(self, query, documents, top_k, return_documents, batch_size, show_progress_bar, num_workers, activation_fct, apply_softmax, convert_to_numpy, convert_to_tensor)
    548     if return_documents:
    549         results[-1].update({"text": documents[i]})
--> 551 results = sorted(results, key=lambda x: x["score"], reverse=True)
    552 return results[:top_k]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I use sentence_transformers v3.3.0.

A snippet:

cross_encoder = sentence_transformers.CrossEncoder("amberoad/bert-multilingual-passage-reranking-msmarco", device='cpu', max_length=256)
cross_encoder.rank(query, docs)

The text was updated successfully, but these errors were encountered:

JINO-ROHIT · 2024-12-09T15:45:40Z

@saeeddhqan can you also share the structure of your query and docs?

saeeddhqan · 2024-12-10T06:08:22Z

cross_encoder.rank('docx', ['doc1', 'doc2', 'doc3'])

JINO-ROHIT · 2024-12-10T08:21:58Z

@saeeddhqan it works for me

from sentence_transformers.cross_encoder import CrossEncoder
cross_encoder = CrossEncoder("cross-encoder/stsb-distilroberta-base")

cross_encoder.rank('docx', ['doc1', 'doc2', 'doc3'])

Response

[{'corpus_id': 0, 'score': np.float32(0.5175216)},
 {'corpus_id': 2, 'score': np.float32(0.4488596)},
 {'corpus_id': 1, 'score': np.float32(0.43759635)}]

tomaarsen · 2024-12-10T08:51:22Z

@JINO-ROHIT The issue seems to be model specific.

@saeeddhqan thanks for opening! The CrossEncoder class wraps around the AutoModelForSequenceClassification class from transformers, and those models can predict logits for $n$ classes per sequence (query-document pairs in this case). The CrossEncoder.predict method will call this underlying model and return all predictions. For amberoad/bert-multilingual-passage-reranking-msmarco, that's 2:

from sentence_transformers.cross_encoder import CrossEncoder

cross_encoder = CrossEncoder("amberoad/bert-multilingual-passage-reranking-msmarco", device='cpu', max_length=256)
print(cross_encoder.predict([('docx', 'doc1')]))
# [[-1.2904704  1.1504961]]
print(cross_encoder.config.num_labels)
# 2

whereas for a lot of CrossEncoder models (e.g. cross-encoder/stsb-distilroberta-base) it's just 1:

from sentence_transformers.cross_encoder import CrossEncoder

cross_encoder = CrossEncoder("cross-encoder/stsb-distilroberta-base", device='cpu', max_length=256)
print(cross_encoder.predict([('docx', 'doc1')]))
# [0.51752156]
print(cross_encoder.config.num_labels)
# 1

Beyond that, the CrossEncoder.rank method internally calls CrossEncoder.predict and then expects that each query-document pair results in 1 value (i.e. that the model only has 1 label). What's missing is a raise ValueError in CrossEncoder.rank if self.config.num_labels != 1, because if there's multiple values per prediction, then it's unclear which one denotes the similarity. In short: CrossEncoder models with more than 1 label can't be used with CrossEncoder.rank at the moment, only with CrossEncoder.predict, and then you can do the ranking yourself if you know which value corresponds with similarity.

Tom Aarsen

JINO-ROHIT · 2024-12-10T09:02:13Z

ahh okay makes sense, i can help with a PR for this if youre not working on this 😊

tomaarsen · 2024-12-10T09:39:45Z

That would be much appreciated!

Tom Aarsen

tomaarsen added enhancement New feature or request good first issue Good for newcomers labels Dec 10, 2024

tomaarsen assigned JINO-ROHIT Dec 10, 2024

JINO-ROHIT mentioned this issue Dec 10, 2024

raises ValueError when num_label !=1 when using Crossencoder.rank() #3126

Merged

tomaarsen closed this as completed in #3126 Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrossEncoder .rank condition error in CrossEncoder.py #3124

CrossEncoder .rank condition error in CrossEncoder.py #3124

saeeddhqan commented Dec 9, 2024 •

edited

Loading

JINO-ROHIT commented Dec 9, 2024

saeeddhqan commented Dec 10, 2024

JINO-ROHIT commented Dec 10, 2024

tomaarsen commented Dec 10, 2024 •

edited

Loading

JINO-ROHIT commented Dec 10, 2024

tomaarsen commented Dec 10, 2024

CrossEncoder .rank condition error in CrossEncoder.py #3124

CrossEncoder .rank condition error in CrossEncoder.py #3124

Comments

saeeddhqan commented Dec 9, 2024 • edited Loading

JINO-ROHIT commented Dec 9, 2024

saeeddhqan commented Dec 10, 2024

JINO-ROHIT commented Dec 10, 2024

tomaarsen commented Dec 10, 2024 • edited Loading

JINO-ROHIT commented Dec 10, 2024

tomaarsen commented Dec 10, 2024

saeeddhqan commented Dec 9, 2024 •

edited

Loading

tomaarsen commented Dec 10, 2024 •

edited

Loading