Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training and inference context lengths #19

Open
jxiong21029 opened this issue Nov 3, 2024 · 1 comment
Open

Training and inference context lengths #19

jxiong21029 opened this issue Nov 3, 2024 · 1 comment

Comments

@jxiong21029
Copy link

Hello,

In order to use the open-source oasis model effectively, some necessary information is missing:

  1. What is the maximum sequence length that the oasis500m model was trained on?
  2. Was the model trained with masking strategies like sliding window attention and/or transformer-XL style recurrence?
  3. How do you handle context length at inference time? Do you simply discard the oldest tokens in the KV cache after a maximum time horizon, or do you simply stop generating once the KV cache is maxed out?
  4. In the provided generate.py script, the noise schedule seems to apply a uniform noise level to all context tokens, approximately min(current step noise level, 300), rather than e.g. the pyramid scheduler described in the original diffusion forcing paper. Was this noise schedule selected heuristically or was it tuned specifically for this project, and is it the same schedule used for the live demo?

Thanks!

@julian-q
Copy link
Collaborator

julian-q commented Nov 8, 2024

Thanks for your detailed questions @jxiong21029

  1. 32 frames
  2. Just trained on 32 frame sequences
  3. Yep we discard the tokens past the latest 32 frames!
  4. We experimented with different noise schedules and chose what works best. It seems that using a constant noise level for the context works well when you want to fully denoise each new frame one at a time. (As opposed to some use cases of Diffusion Forcing where you progressively denoise future frames.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants