You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The segments we trained are all 4s long, and it is difficult to generalize to arbitrary length gestures by positional encoding alone. MDM-based models that require time-awareness (arbitrarily long inference) require a smooth transition between the generated sequences. The following practices can be referred to:
I am wondering if this is a bug in sample.py when smoothing the transitions here:
As you have commented yourself, the size of varaible last_poses is (1, model.njoints, 1, args.n_seed), so len(last_poses) is always 1. I think len(last_poses) should be replaced with np.size(last_poses, axis=-1) which is args.n_seed (30 frames by default). This way, it combines the first frames of the new prediction with the last frames of previous prediction, something like this:
for j in range(np.size(last_poses, axis=-1)):
n = np.size(last_poses, axis=-1)
prev = last_poses[..., j]
next = sample[..., j]
sample[..., j] = prev * (n - j) / (n + 1) + next * (j + 1) / (n + 1)
Am I right? Would appreciate your feedback.
Thanks a lot
Yes, when I reproduced it later I remembered that there was a minor problem in this region, but it didn't seem to have much effect on the results. Also:
the length of last_poses is not 1, but n_seed, where the first 1 indicates the batch size and the second 1 extends the dimensions, which has no real meaning.
the follow-up DiffuseStyleGesture+ definitely fixed this, see: here.
The segments we trained are all 4s long, and it is difficult to generalize to arbitrary length gestures by positional encoding alone.
MDM-based models that require time-awareness (arbitrarily long inference) require a smooth transition between the generated sequences. The following practices can be referred to:
The text was updated successfully, but these errors were encountered: