How decoder layers take into account time embedding? #4

vadimkantorov · 2023-07-15T11:25:34Z

It's not very clear in the paper how time embedding affects the decoder layers. Do I understand correctly that every DDIM step involve calling all 9 decoder layers?

Am I right that time embedding does scale and shift transformer embeddings? Is it the only use of the time? Are there any ablations on its influence? https://github.com/cp3wan/DFormer/blob/main/dformer/modeling/transformer_decoder/dformer_transformer_decoder.py#L438-L442:

#biases the query
scale_shift = self.block_time_mlp(time).unsqueeze(0)
scale_shift = scale_shift.type(torch.float32)
scale_shift = torch.repeat_interleave(scale_shift, self.num_queries, dim=0)
scale, shift = scale_shift.chunk(2, dim=2)
output = output * (scale + 1) + shift

Given that the multiple diffusion steps in inference do not improve the result, do you actually use "diffusion" in inference? If so, which time step value are you using for this 1-step process?

I found these two lines:

DFormer/dformer/config.py

Line 47 in d3eef80

cfg.MODEL.DFORMER.SAMPLE_STEP=1

DFormer/dformer/DFormer_model.py

Line 98 in d3eef80

timesteps = 1000

Is single step used only for inference? and in training max timestep of 1000 is used?

Would the benefit of multistep inference diffusion be larger if fewer layers were used in the decoder?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How decoder layers take into account time embedding? #4

How decoder layers take into account time embedding? #4

vadimkantorov commented Jul 15, 2023 •

edited

Loading

How decoder layers take into account time embedding? #4

How decoder layers take into account time embedding? #4

Comments

vadimkantorov commented Jul 15, 2023 • edited Loading

vadimkantorov commented Jul 15, 2023 •

edited

Loading