-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ViTencoder input #15
Comments
Yes, it could improve the performance a little. As normal images of the input image are easy to acquire (using a pre-trained normal estimation model), we also input the front/back normal images. However, if you want to input a single image, the whole model may have to be retrained. |
If you just want to use our model for inference, you can just input the image and the script will automatically resize it to (512,512). However, if you want to use it in training, you will need to change the parameters and retrain the model. I'm not sure what you mean that you used the vitpose pre-trained model. |
I found that the front/back normal maps are also used as input to the encoder and image to generate three-plane features. I want to know why? Will the result be improved?
Reading the code, I found that after obtaining the three-plane feature map, it was concatenated with the normal feature.
I only input the image through VitPose's pre-trained ViTencoder model to get the image features, and then also through the three decoders to get the three-plane features and splice with the normal features. Is that all right?
The text was updated successfully, but these errors were encountered: