Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrossEncoder .rank condition error in CrossEncoder.py #3124

Closed
saeeddhqan opened this issue Dec 9, 2024 · 6 comments · Fixed by #3126
Closed

CrossEncoder .rank condition error in CrossEncoder.py #3124

saeeddhqan opened this issue Dec 9, 2024 · 6 comments · Fixed by #3126
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@saeeddhqan
Copy link

saeeddhqan commented Dec 9, 2024

I get the following error when I use .rank method:

File /usr/local/lib/python3.12/dist-packages/sentence_transformers/cross_encoder/CrossEncoder.py:551, in CrossEncoder.rank(self, query, documents, top_k, return_documents, batch_size, show_progress_bar, num_workers, activation_fct, apply_softmax, convert_to_numpy, convert_to_tensor)
    548     if return_documents:
    549         results[-1].update({"text": documents[i]})
--> 551 results = sorted(results, key=lambda x: x["score"], reverse=True)
    552 return results[:top_k]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I use sentence_transformers v3.3.0.

A snippet:

cross_encoder = sentence_transformers.CrossEncoder("amberoad/bert-multilingual-passage-reranking-msmarco", device='cpu', max_length=256)
cross_encoder.rank(query, docs)
@JINO-ROHIT
Copy link
Contributor

@saeeddhqan can you also share the structure of your query and docs?

@saeeddhqan
Copy link
Author

cross_encoder.rank('docx', ['doc1', 'doc2', 'doc3'])

@JINO-ROHIT
Copy link
Contributor

@saeeddhqan it works for me

from sentence_transformers.cross_encoder import CrossEncoder
cross_encoder = CrossEncoder("cross-encoder/stsb-distilroberta-base")

cross_encoder.rank('docx', ['doc1', 'doc2', 'doc3'])

Response

[{'corpus_id': 0, 'score': np.float32(0.5175216)},
 {'corpus_id': 2, 'score': np.float32(0.4488596)},
 {'corpus_id': 1, 'score': np.float32(0.43759635)}]

@tomaarsen tomaarsen added enhancement New feature or request good first issue Good for newcomers labels Dec 10, 2024
@tomaarsen
Copy link
Collaborator

tomaarsen commented Dec 10, 2024

@JINO-ROHIT The issue seems to be model specific.

@saeeddhqan thanks for opening! The CrossEncoder class wraps around the AutoModelForSequenceClassification class from transformers, and those models can predict logits for $n$ classes per sequence (query-document pairs in this case). The CrossEncoder.predict method will call this underlying model and return all predictions. For amberoad/bert-multilingual-passage-reranking-msmarco, that's 2:

from sentence_transformers.cross_encoder import CrossEncoder

cross_encoder = CrossEncoder("amberoad/bert-multilingual-passage-reranking-msmarco", device='cpu', max_length=256)
print(cross_encoder.predict([('docx', 'doc1')]))
# [[-1.2904704  1.1504961]]
print(cross_encoder.config.num_labels)
# 2

whereas for a lot of CrossEncoder models (e.g. cross-encoder/stsb-distilroberta-base) it's just 1:

from sentence_transformers.cross_encoder import CrossEncoder

cross_encoder = CrossEncoder("cross-encoder/stsb-distilroberta-base", device='cpu', max_length=256)
print(cross_encoder.predict([('docx', 'doc1')]))
# [0.51752156]
print(cross_encoder.config.num_labels)
# 1

Beyond that, the CrossEncoder.rank method internally calls CrossEncoder.predict and then expects that each query-document pair results in 1 value (i.e. that the model only has 1 label). What's missing is a raise ValueError in CrossEncoder.rank if self.config.num_labels != 1, because if there's multiple values per prediction, then it's unclear which one denotes the similarity. In short: CrossEncoder models with more than 1 label can't be used with CrossEncoder.rank at the moment, only with CrossEncoder.predict, and then you can do the ranking yourself if you know which value corresponds with similarity.

  • Tom Aarsen

@JINO-ROHIT
Copy link
Contributor

ahh okay makes sense, i can help with a PR for this if youre not working on this 😊

@tomaarsen
Copy link
Collaborator

That would be much appreciated!

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants