You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@joonson I have some doubt in the code of SyncNetInstance.py.
In the function calc_pdist the reason to consider the window it to take the consideration of the offset right?
The way you are computing this distance it would return you the shape of (lastframe, window_size) when you perform torch.stack(dists,1) and then later you find mdist and I am unable to understand the logic behind this computation in the code you have done mdist = torch.mean(torch.stack(dists,1),1) i.e., you have taken the average across the column which gives you the mdist of the shape(1,31) i.e., simply list of 31 values. Would you please give your views on why have you taken the mean across column because from my understanding the mean should be taken across rows then it would be of shape (lastframe, 1) i.e., mean for each frame while considering a window.
Also I have performed an Experiment in which I have computed the distance and confidence for an original file which was not dubbed and for that the distance I am getting is pretty high and confidence is very low but it supposed to be the distance would be coming low and the confidence should be high and then I have created a dubbed video of an speaker saying the same statement said in the original file using wave2lip model and then computed the distance and confidence and this distance is comparable lower with respect to the distance computed for original video.
What would be the reason for this?
Please give your views on why taking the mean across column not across rows?
The text was updated successfully, but these errors were encountered:
Himanshu21135
changed the title
Why confidence and the distance for an original video is coming high?
Why confidence and the distance for an original video is coming Low and High respectively?
Apr 8, 2024
@joonson I have some doubt in the code of SyncNetInstance.py.
In the function calc_pdist the reason to consider the window it to take the consideration of the offset right?
The way you are computing this distance it would return you the shape of (lastframe, window_size) when you perform torch.stack(dists,1) and then later you find mdist and I am unable to understand the logic behind this computation in the code you have done mdist = torch.mean(torch.stack(dists,1),1) i.e., you have taken the average across the column which gives you the mdist of the shape(1,31) i.e., simply list of 31 values.
Would you please give your views on why have you taken the mean across column because from my understanding the mean should be taken across rows then it would be of shape (lastframe, 1) i.e., mean for each frame while considering a window.
Also I have performed an Experiment in which I have computed the distance and confidence for an original file which was not dubbed and for that the distance I am getting is pretty high and confidence is very low but it supposed to be the distance would be coming low and the confidence should be high and then I have created a dubbed video of an speaker saying the same statement said in the original file using wave2lip model and then computed the distance and confidence and this distance is comparable lower with respect to the distance computed for original video.
What would be the reason for this?
Please give your views on why taking the mean across column not across rows?
The text was updated successfully, but these errors were encountered: