CLIP can only handle sequences up to 128 tokens #28

loretoparisi · 2024-11-25T23:33:30Z

When using detailed prompts (as suggested by the good prompt engineering guidelines) it may happen that the prompt gets truncated:

WARNING:ltx_video.pipelines.pipeline_ltx_video:The following part of your input was truncated because CLIP can only handle sequences up to 128 tokens:

The text was updated successfully, but these errors were encountered:

loretoparisi · 2024-11-26T11:32:52Z

As far as I can see here
the T5Embedder provided in PixArt-alpha we have by defaults this model_max_length:

def __init__(self, device, dir_or_name='t5-v1_1-xxl', *, local_cache=False, cache_dir=None, hf_token=None, use_text_preprocessing=True,
                 t5_model_kwargs=None, torch_dtype=None, use_offload_folder=None, model_max_length=120):

while in the pipe here I see this comment

# See Section 3.1. of the paper.
 # FIXME: to be configured in config not hardecoded. Fix in separate PR with rest of config
max_length = 128  # TPU supports only lengths multiple of 128

The main problem here is that, assumed I have a previous prompt, and I want to "expand" or rewrite it using the suggested style/prompt engineering guidelines, this will break the 128 tokens limit - most of the time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP can only handle sequences up to 128 tokens #28

CLIP can only handle sequences up to 128 tokens #28

loretoparisi commented Nov 25, 2024

loretoparisi commented Nov 26, 2024 •

edited

Loading

CLIP can only handle sequences up to 128 tokens #28

CLIP can only handle sequences up to 128 tokens #28

Comments

loretoparisi commented Nov 25, 2024

loretoparisi commented Nov 26, 2024 • edited Loading

loretoparisi commented Nov 26, 2024 •

edited

Loading