Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some question about model trained on 768 size TED dataset #54

Open
Zenobia7 opened this issue Aug 1, 2022 · 5 comments
Open

Some question about model trained on 768 size TED dataset #54

Zenobia7 opened this issue Aug 1, 2022 · 5 comments

Comments

@Zenobia7
Copy link

Zenobia7 commented Aug 1, 2022

First of all, thank you very much for providing the code, but I have encountered some small problems in the process of retraining, so I would like to ask you how to deal with it. Questions to consult are as follows:
1、why reconstruction mode and train model with almost same L1 loss value?
2、Using the 768 size TED dataset, it is normal that some parts with more detailed information, such as hands and faces, are not recovered too well. If the current situation occurs, can you help to provide some solutions?
3. When the motion trend is obvious, the optical flow map is not very accurate.
4. Are there any precautions that need to take in preparing new dataset?
The above are all my questions at present. Looking forward to your reply

@AliaksandrSiarohin
Copy link
Collaborator

Hi, sorry but your questions is really confusing:

  1. I don't get the question. There is no L1 in train mode.
  2. What is 768?
  3. Could you provide example?
  4. Depends on what objects will be in the new dataset.

@Zenobia7
Copy link
Author

Zenobia7 commented Aug 4, 2022

  1. I used the reconstruction results of train mode to calculate the L1 loss and the reconstruction results of avd mode are almost the same, so I think avd mode is not effective
  2. I cut TED dataset with 768*768 size
  3. The new dataset is based on half-speaker video objects. Some videos of the new dataset are below,The new data sets are highly heterogeneous and diverse
    https://user-images.githubusercontent.com/28126038/182800076-b9e4dea5-d927-41cd-ab7d-038e2cfccbf3.mp4
    https://user-images.githubusercontent.com/28126038/182800140-632904d1-27e7-4a4a-9ec2-142fc59e01b5.mp4
    https://user-images.githubusercontent.com/28126038/182800340-c7f54217-72a0-4a01-99d4-6cd7c4ec64e9.mp4

3.train mode visualization Results

0gks6ceq4eQ.004737.004870.mp4.mp4

avd mode visualization Results

0gks6ceq4eQ.004737.004870.mp4.mp4

train log visualization
train_log

Is it convenient for you to provide the training log? I want to compare it with my log. Thank you. Is there anything unclear

@AliaksandrSiarohin
Copy link
Collaborator

  1. Reconstruction does not make sense for avd, since it specifically designed for cross identity, where the shapes of the objects could be different.
  2. There are no explicit handling of parts that is not visible most of the time, I guess you will have to device some way of handling that.
  3. I can't see what bothers you in optical flow map.
  4. Unfortunately I don't have logs anymore.

@Zenobia7
Copy link
Author

Zenobia7 commented Aug 5, 2022

  1. Reconstruction does not make sense for avd, since it specifically designed for cross identity, where the shapes of the objects could be different.
  2. There are no explicit handling of parts that is not visible most of the time, I guess you will have to device some way of handling that.
  3. I can't see what bothers you in optical flow map.
  4. Unfortunately I don't have logs anymore.

Thank you for your prompt reply.

  1. Since there is no problem with the optical flow diagram, does it mean that there will be a problem that the details of the reconstruction are not clear? Is the reason that the reconstruction details are not clear is that the generator is not strong enough or the information of the optical flow diagram is not fully utilized?

  2. Do you think it is OK for me to use half-speaker videos with complex background and inconsistent height in my self-built data set? It seems to me that Loss is decreasing rapidly at present, and then it will not decrease

微信图片_20220805103913

VIDEzO9Daec770Ndf6uLP9uc220323.057018.057137.mp4.mp4

https://user-images.githubusercontent.com/28126038/182990785-43862275-00db-4a46-a569-6dc1489180b4.mp4
Uploading 20200507094714_11_aC9no_1080p#008375#008417.mp4.mp4…

VIDEST94vUrlZI7XD2po9et1220128.040894.041054.mp4.mp4
20180614112218_419_zwosJ_1080p.011847.011863.mp4.mp4
VIDEzO9Daec770Ndf6uLP9uc220323.024783.024804.mp4.mp4

@laodar
Copy link

laodar commented Aug 21, 2024

@Zenobia7 Hi, do you have a paper or benchmark about your new dataset? Is the new dataset public now? How did you get it? Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants