-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add possiblility to interpolate in latent space instead of embedding space #31
Comments
great idea, latent interpolation is really interesting maybe there could be a setting to toggle between them, if you don't want to break old prompts? |
If there is a way to implement this, then I think it is a good idea. Check the more useful option by default and allow users to change the defaults. Now that I know just that little bit more about the internals of SD, I am not sure this feature makes a lot of sense however. I don't know what interpolating in latent space instead of embed space would mean for this extension, since this extension is about textual interpolation. IIUC, latent interpolation is more about interpolating between different completed images rather than different steps of image creation. I'll close this issue for now, but if this is something you think is possible in another way I did not consider, please open another issue! (or let me know in a further comment) |
So it seems this is would be in fact possible by blending generated model epsilons at varying rates. To gather the needed epsilons to do this, we can use the implementation of composable diffusion in the webui via the |
Currently, we would need an exponential number of model generations for the extension to do its job. For example:
Means generating 4 epsilons using the same 4 conds at each iteration:
and then interpolate these in the combine_denoise method of the webui's cfg denoiser, using them as control points for the chosen interpolation method. So instead of having an exponential initial startup time, this will take an exponential time at each generation step. As long as the number of interpolations is kept to a low number, for example 5, it should be relatively fine on moderately capable machines, but otherwise this is not really practical. Maybe we'll add a console log and some information in the readme to make sure people are informed of the side effects of using too many interpolations. |
Prompt interpolation currently uses embeddings to calculate values in-between control points. I'm not sure whether this is the right way to interpolate prompts, or if we should instead work in latent space.
One way to include this would be to add a variation to the current prompt interpolation curves:
linear-latent
catmull-latent
bezier-latent
I will consider removing embeds interpolation altogether if latent interpolation turns out to generate strictly better results in terms of quality and proximity to prompt, but I'm reticent to do this. I'm not sure we can change interpolation curves implementation too much because it will break old prompts.
The text was updated successfully, but these errors were encountered: