Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the result looks not that good #5

Closed
Orange-Ctrl opened this issue Dec 7, 2023 · 21 comments
Closed

the result looks not that good #5

Orange-Ctrl opened this issue Dec 7, 2023 · 21 comments

Comments

@Orange-Ctrl
Copy link

Hi,
thank you for your great work!
I run the code on rtx 3090 and the training process works well. But the result I got looks so strange. Yesterday you told me to fix the tiny-cuda-nn warning tinycudann was built for lower compute capability ({cc}) than the system's ({system_compute_capability}). Performance may be suboptimal. and I just can't fix it by now. But maybe the result won't be that bad because of the warning?
Can you give me some advice to fix this. Thank you in advance!
image

@sbharadwajj
Copy link
Owner

sbharadwajj commented Dec 8, 2023

Hi,

This looks wrong. I think something else is not working at all. It is not because of tinycudann because we use that only for the 2nd stage of training. However, it is possible that during the first stage we get a proper mesh, but due to the wrong installation of tinycudann, the mesh diverges completely in the 2nd stage.

  1. can you visualize mesh_latest from the first training stage?
  2. can you show me the grid images saved during the first training stage?

@Orange-Ctrl
Copy link
Author

Hi,

This looks wrong. I think something else is not working at all. It is not because of tinycudann because we use that only for the 2nd stage of training. However, it is possible that during the first stage we get a proper mesh, but due to the wrong installation of tinycudann, the mesh diverges completely in the 2nd stage.

  1. can you visualize mesh_latest from the first training stage?
  2. can you show me the grid images saved during the first training stage?

Hi,
mesh_latest and grid_100 image on the first training stage look like this:
image
grid_100
Also, I got this: ninja: no work to do.
image

@sbharadwajj
Copy link
Owner

sbharadwajj commented Dec 9, 2023

These two are from the first stage of training correct?

Which dataset are you using? And can you show me grid_0? To check if the data is correct?

maybe there is a problem with the dataset? As a sanity check can you run on the dataset provided by IMavatar?

@Orange-Ctrl
Copy link
Author

These two are from the first stage of training correct?

Which dataset are you using? And can you show me grid_0? To check if the data is correct?

maybe there is a problem with the dataset? As a sanity check can you run on the dataset provided by IMavatar?

Yes, they're all from the stage_1.
I use the dataset 'yufeng' downloaded from IMavatar. The stage_1 grid_1:
grid_1

@sbharadwajj
Copy link
Owner

I see the problem.
the first column is supposed to visualise the ground truth, but it’s blank. So it’s training with a blank image.

Can you verify if you have changed something? It’s not loading the data at all.

@adrianJW421
Copy link

I came across a similar problem when simply running the " python train.py --config configs/001.txt " in README. I didn't change any other code except adjust the dataset_util.py in flare/dataset -- replace the as_gray=True to mode="L", which I believe is not the cause of the error. So I'm looking forward to seeing further developments in this discussion.

@sbharadwajj
Copy link
Owner

@adrianJW421
Can you change it back to how it previously was and share a visualisation of grid_0?

Can you elaborate what you mean by similar problem? I think the problem with @Orange-Ctrl is that the data is not loading correctly at all.

@Orange-Ctrl
Copy link
Author

I see the problem. the first column is supposed to visualise the ground truth, but it’s blank. So it’s training with a blank image.

Can you verify if you have changed something? It’s not loading the data at all.

hello!
I'm pretty sure i download the correct dataset and didn't change any code. What may be the problem of not loading data?

@Orange-Ctrl
Copy link
Author

I think it's because the mask is always zero. when I delect img = img * mask in dataset_real.py, the gt show.
image
this is the stage1 grid1

@zydmu123
Copy link

Hello@Orange-Ctrl Could you share your env setting? my code is also based on RTX3090, but it couldn't work, just "Re-initializing main() because the training of light MLP diverged and all the values ​​are zero" for all the time...

@Orange-Ctrl
Copy link
Author

Hello@Orange-Ctrl Could you share your env setting? my code is also based on RTX3090, but it couldn't work, just "Re-initializing main() because the training of light MLP diverged and all the values ​​are zero" for all the time...

I met this problem before #4 . requirement.txt lack of the module robust_loss_pytorch, pip install git+https://github.com/jonbarron/robust_loss_pytorch and it works

@zydmu123
Copy link

I trid installing this lib, but it didn't change anthing... My pytorch version is 1.13.1 by "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia", cudatookit is 11.3

@Orange-Ctrl
Copy link
Author

Orange-Ctrl commented Jan 17, 2024

I trid installing this lib, but it didn't change anthing... My pytorch version is 1.13.1 by "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia", cudatookit is 11.3

the same as you, maybe you can try to change the code in train.py, don't use while(True) and get the error information
image

@zydmu123
Copy link

OK, I'll try, thanks so much! @Orange-Ctrl

@Orange-Ctrl
Copy link
Author

Orange-Ctrl commented Jan 17, 2024

hi!@sbharadwajj
I finally get the correct mesh. I change the code in dataset_util.py

def _load_mask(fn):
     alpha = imageio.imread(fn) 
     mask = torch.Tensor(np.array(alpha) > 127.5)[:, :, 1:2].bool().int().float()
    return mask

instead of the origin

def _load_mask(fn):
    alpha = imageio.imread(fn, mode='L') 
    alpha = skimage.img_as_float32(alpha)
    mask = torch.tensor(alpha / 255., dtype=torch.float32).unsqueeze(-1)
    mask[mask < 0.5] = 0.0
    return mask

@sbharadwajj
Copy link
Owner

@Orange-Ctrl can you tell me the quantitative results on yufeng dataset? I will verify if I have the same.

I will look into why you had to change the mask code soon.

@Orange-Ctrl
Copy link
Author

@Orange-Ctrl can you tell me the quantitative results on yufeng dataset? I will verify if I have the same.

I will look into why you had to change the mask code soon.
ok, this is stored in final_eval.txt


w/o cloth result:

MAE | LPIPS | SSIM | PSNR
0.24321708372194473 0.4164765131310241 0.5220661378886602 8.06310037168738
w/o cloth result:

MAE | LPIPS | SSIM | PSNR
0.026779092522720767 0.1019469020097223 0.8518311991103708 23.98372743893976
w/o cloth result:

MAE | LPIPS | SSIM | PSNR
0.02751253441690582 0.09591557103068861 0.8505647247458158 23.915589923597363

@sbharadwajj
Copy link
Owner

The numbers look correct for Yufeng dataset. I assume the first row of results are when the training diverged correct?

I will get back to you about the mask.

@Orange-Ctrl
Copy link
Author

The numbers look correct for Yufeng dataset. I assume the first row of results are when the training diverged correct?

I will get back to you about the mask.

ok, thank you so much~

@adrianJW421
Copy link

hi!@sbharadwajj I finally get the correct mesh. I change the code in dataset_util.py

def _load_mask(fn):
     alpha = imageio.imread(fn) 
     mask = torch.Tensor(np.array(alpha) > 127.5)[:, :, 1:2].bool().int().float()
    return mask

instead of the origin

def _load_mask(fn):
    alpha = imageio.imread(fn, mode='L') 
    alpha = skimage.img_as_float32(alpha)
    mask = torch.tensor(alpha / 255., dtype=torch.float32).unsqueeze(-1)
    mask[mask < 0.5] = 0.0
    return mask

I also receive right results by following @Orange-Ctrl 's answer, and making sure that all necessary packages like "robust_loss_pytorch" is correctly installed. Besides, I made another change at dataset_util.py:

def _load_semantic(fn):
# delete the out dated param 'as_gray=True' in my env settings
img = imageio.imread(fn)

@sbharadwajj
Copy link
Owner

hi @adrianJW421, @Orange-Ctrl
My apologies for getting back so late.

While changing the mask code makes the code run, it is not exactly correct as the mask values are binary and not continuous anymore.

Could you please share a single mask image with me? When I tested now by downloading IMavatar's data, this code seems to work for me:

def _load_mask(fn):
    alpha = imageio.imread(fn, mode='L') 
    alpha = skimage.img_as_float32(alpha)
    mask = torch.tensor(alpha / 255., dtype=torch.float32).unsqueeze(-1)
    mask[mask < 0.5] = 0.0
    return mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants