Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add possiblility to interpolate in latent space instead of embedding space #31

Open
ljleb opened this issue Jan 28, 2023 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@ljleb
Copy link
Owner

ljleb commented Jan 28, 2023

Prompt interpolation currently uses embeddings to calculate values in-between control points. I'm not sure whether this is the right way to interpolate prompts, or if we should instead work in latent space.

One way to include this would be to add a variation to the current prompt interpolation curves:

  • linear-latent
  • catmull-latent
  • bezier-latent

I will consider removing embeds interpolation altogether if latent interpolation turns out to generate strictly better results in terms of quality and proximity to prompt, but I'm reticent to do this. I'm not sure we can change interpolation curves implementation too much because it will break old prompts.

@ljleb ljleb added the enhancement New feature or request label Jan 28, 2023
@gsgoldma
Copy link

gsgoldma commented Jan 30, 2023

great idea, latent interpolation is really interesting

maybe there could be a setting to toggle between them, if you don't want to break old prompts?

@ljleb
Copy link
Owner Author

ljleb commented Apr 10, 2023

maybe there could be a setting to toggle between them, if you don't want to break old prompts?

If there is a way to implement this, then I think it is a good idea. Check the more useful option by default and allow users to change the defaults.

Now that I know just that little bit more about the internals of SD, I am not sure this feature makes a lot of sense however. I don't know what interpolating in latent space instead of embed space would mean for this extension, since this extension is about textual interpolation. IIUC, latent interpolation is more about interpolating between different completed images rather than different steps of image creation.

I'll close this issue for now, but if this is something you think is possible in another way I did not consider, please open another issue! (or let me know in a further comment)

@ljleb ljleb closed this as completed Apr 10, 2023
@ljleb
Copy link
Owner Author

ljleb commented May 21, 2023

So it seems this is would be in fact possible by blending generated model epsilons at varying rates. To gather the needed epsilons to do this, we can use the implementation of composable diffusion in the webui via the AND and monkey patch the cfg denoising code to interpolate noise maps together.

@ljleb ljleb reopened this May 21, 2023
@ljleb
Copy link
Owner Author

ljleb commented May 21, 2023

Currently, we would need an exponential number of model generations for the extension to do its job. For example:

a [ magnificent ship : majestic castle : , : linear-latent], beautiful landscape, [ sun lens flare : golden hour : , : linear-latent]

Means generating 4 epsilons using the same 4 conds at each iteration:

  • a magnificent ship, beautiful landscape, sun lens flare
  • a majestic castle, beautiful landscape, sun lens flare
  • a magnificent ship, beautiful landscape, golden hour
  • a majestic castle, beautiful landscape, golden hour

and then interpolate these in the combine_denoise method of the webui's cfg denoiser, using them as control points for the chosen interpolation method.

So instead of having an exponential initial startup time, this will take an exponential time at each generation step. As long as the number of interpolations is kept to a low number, for example 5, it should be relatively fine on moderately capable machines, but otherwise this is not really practical. Maybe we'll add a console log and some information in the readme to make sure people are informed of the side effects of using too many interpolations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants