Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLIP can only handle sequences up to 128 tokens #28

Open
loretoparisi opened this issue Nov 25, 2024 · 1 comment
Open

CLIP can only handle sequences up to 128 tokens #28

loretoparisi opened this issue Nov 25, 2024 · 1 comment

Comments

@loretoparisi
Copy link

When using detailed prompts (as suggested by the good prompt engineering guidelines) it may happen that the prompt gets truncated:

WARNING:ltx_video.pipelines.pipeline_ltx_video:The following part of your input was truncated because CLIP can only handle sequences up to 128 tokens:
@loretoparisi
Copy link
Author

loretoparisi commented Nov 26, 2024

As far as I can see here
the T5Embedder provided in PixArt-alpha we have by defaults this model_max_length:

def __init__(self, device, dir_or_name='t5-v1_1-xxl', *, local_cache=False, cache_dir=None, hf_token=None, use_text_preprocessing=True,
                 t5_model_kwargs=None, torch_dtype=None, use_offload_folder=None, model_max_length=120):

while in the pipe here I see this comment

# See Section 3.1. of the paper.
 # FIXME: to be configured in config not hardecoded. Fix in separate PR with rest of config
max_length = 128  # TPU supports only lengths multiple of 128

The main problem here is that, assumed I have a previous prompt, and I want to "expand" or rewrite it using the suggested style/prompt engineering guidelines, this will break the 128 tokens limit - most of the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant