Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: text input must of type str (single example), List[str] (batch or single pretokenized example) or List[List[str]] (batch of pretokenized examples). #21366

Closed
alhuri opened this issue Jan 30, 2023 · 4 comments

Comments

@alhuri
Copy link

alhuri commented Jan 30, 2023

I am trying to run the evaluation of both MCLIP on zero-shot learning task found on this notebook colab.

the model is loaded using the below code

if MODEL_TYPE == 'mClip':
    from sentence_transformers import SentenceTransformer
    # Here we load the multilingual CLIP model. Note, this model can only encode text.
    # If you need embeddings for images, you must load the 'clip-ViT-B-32' model
    se_language_model = SentenceTransformer('clip-ViT-B-32-multilingual-v1')
    se_image_model = SentenceTransformer('clip-ViT-B-32')
    language_model = lambda queries: se_language_model.encode(queries, convert_to_tensor=True, show_progress_bar=False).cpu().detach().numpy()
    image_model = lambda images: se_image_model.encode(images, batch_size=1024, convert_to_tensor=True, show_progress_bar=False).cpu().detach().numpy()

when running the below prediction cell

top_ns = [1, 5, 10, 100]
acc_counters = [0. for _ in top_ns]
n = 0.

for i, (images, target) in enumerate(tqdm(loader)):
    images = images
    target = target.numpy()
    # predict
    image_features = image_model(images)
    image_features = image_features / np.linalg.norm(image_features, axis=-1, keepdims=True)
    logits = 100. * image_features @ zeroshot_weights

    # measure accuracy
    accs = accuracy(logits, target, topk=top_ns)
    for j in range(len(top_ns)):
        acc_counters[j] += accs[j]
    n += images.shape[0]

tops = {f'top{top_ns[i]}': acc_counters[i] / n * 100 for i in range(len(top_ns))}

print(tops)

I get the below error


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-3500c9b4df73> in <module>
     11     target = target.numpy()
     12     # predict
---> 13     image_features = image_model(images)
     14     image_features = image_features / np.linalg.norm(image_features, axis=-1, keepdims=True)
     15     logits = 100. * image_features @ zeroshot_weights

6 frames
<ipython-input-39-f2cc72683291> in <lambda>(images)
      6     se_image_model = SentenceTransformer('clip-ViT-B-32')
      7     language_model = lambda queries: se_language_model.encode(queries, convert_to_tensor=True, show_progress_bar=False).cpu().detach().numpy()
----> 8     image_model = lambda images: se_image_model.encode(images, batch_size=64, convert_to_tensor=False, show_progress_bar=False).cpu().detach().numpy()
      9 elif MODEL_TYPE == 'bothclip':
     10     import jax

/usr/local/lib/python3.8/dist-packages/sentence_transformers/SentenceTransformer.py in encode(self, sentences, batch_size, show_progress_bar, output_value, convert_to_numpy, convert_to_tensor, device, normalize_embeddings)
    159         for start_index in trange(0, len(sentences), batch_size, desc="Batches", disable=not show_progress_bar):
    160             sentences_batch = sentences_sorted[start_index:start_index+batch_size]
--> 161             print("sentences_batch")
    162             print(sentences_batch)
    163             features = self.tokenize(sentences_batch)

/usr/local/lib/python3.8/dist-packages/sentence_transformers/SentenceTransformer.py in tokenize(self, texts)
    317     def tokenize(self, texts: Union[List[str], List[Dict], List[Tuple[str, str]]]):
    318         """
--> 319         Tokenizes the texts
    320         """
    321         return self._first_module().tokenize(texts)

/usr/local/lib/python3.8/dist-packages/sentence_transformers/models/CLIPModel.py in tokenize(self, texts)
     69             images = None
     70 
---> 71         inputs = self.processor(text=texts_values, images=images, return_tensors="pt", padding=True)
     72         inputs['image_text_info'] = image_text_info
     73         return inputs

/usr/local/lib/python3.8/dist-packages/transformers/models/clip/processing_clip.py in __call__(self, text, images, return_tensors, **kwargs)
     97 
     98         if text is not None:
---> 99             encoding = self.tokenizer(text, return_tensors=return_tensors, **kwargs)
    100 
    101         if images is not None:

/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py in __call__(self, text, text_pair, text_target, text_pair_target, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2525             if not self._in_target_context_manager:
   2526                 self._switch_to_input_mode()
-> 2527             encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
   2528         if text_target is not None:
   2529             self._switch_to_target_mode()

/usr/local/lib/python3.8/dist-packages/transformers/tokenization_utils_base.py in _call_one(self, text, text_pair, add_special_tokens, padding, truncation, max_length, stride, is_split_into_words, pad_to_multiple_of, return_tensors, return_token_type_ids, return_attention_mask, return_overflowing_tokens, return_special_tokens_mask, return_offsets_mapping, return_length, verbose, **kwargs)
   2583 
   2584         if not _is_valid_text_input(text):
-> 2585             raise ValueError(
   2586                 "text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) "
   2587                 "or `List[List[str]]` (batch of pretokenized examples)."

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

How can this be fixed?

@sgugger
Copy link
Collaborator

sgugger commented Jan 30, 2023

Looks like an issue with the sentence-transformers library, not Transformers. Cc-ing @ArthurZucker who may other ideas.

@ArthurZucker
Copy link
Collaborator

I don't really but gonna try to have a look through the notebook.

@ArthurZucker
Copy link
Collaborator

@alhuri could you provide a functioning notebook with the reproduction script? This one does not work for me (missing packages etc) with the config you are using? Thanks

@github-actions
Copy link

github-actions bot commented Mar 2, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants