Testing code #3

ozzyou · 2021-03-30T13:29:27Z

Hi, thank you very much for the really nice implementation! I have trained the model for 100 epochs and the evaluation results look nice. I was wondering if there's also testing code available. I implemented my own, but I get results such as the image below.

Thank you very much for your reply.

ozzyou · 2021-03-30T14:00:37Z

See my implementation below

from slot_attention.data import CLEVRDataModule
from slot_attention.method import SlotAttentionMethod
from slot_attention.model import SlotAttentionModel
from slot_attention.params import SlotAttentionParams
from slot_attention.utils import rescale
import torch
from torchvision import transforms
from PIL import Image
from slot_attention.utils import to_rgb_from_tensor
from torchvision import utils as vutils

params = SlotAttentionParams()
model = SlotAttentionModel(
    resolution=params.resolution,
    num_slots=params.num_slots,
    num_iterations=params.num_iterations,
    empty_cache=params.empty_cache,
)
clevr_transforms = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Lambda(rescale),  # rescale between -1 and 1
        transforms.Resize(params.resolution),
    ]
)

clevr_datamodule = CLEVRDataModule(
    data_root=params.data_root,
    max_n_objects=params.num_slots - 1,
    train_batch_size=params.batch_size,
    val_batch_size=params.val_batch_size,
    clevr_transforms=clevr_transforms,
    num_train_images=params.num_train_images,
    num_val_images=params.num_val_images,
    num_workers=params.num_workers,
)

root = "/home/ozzy/Projects/slot_attention/"
model = SlotAttentionMethod(model=model, datamodule=clevr_datamodule, params=params)
model.load_state_dict(torch.load(root + "wandb/offline-run-1/files/slot-attention-clevr6/3cy530ay/checkpoints/epoch=99-step=27298.ckpt"), strict=False)
model.eval()

img = Image.open(root + "data/CLEVR_v1.0/images/test/CLEVR_test_014999.png")
img = img.convert("RGB")
img = clevr_transforms(img).unsqueeze(0)

recon_combined, recons, masks, slots = model(img)
out = to_rgb_from_tensor(
    torch.cat(
        [
            img.unsqueeze(1),  # original images
            recon_combined.unsqueeze(1),  # reconstructions
            recons * masks + (1 - masks),  # each slot
        ],
        dim=1,
    )
)
print("RECON SHAPE", out.shape)

batch_size, num_slots, C, H, W = recons.shape
images = vutils.make_grid(
    out.view(batch_size * out.shape[1], C, H, W).cpu(), normalize=False, nrow=out.shape[1],
)

new = transforms.ToPILImage()(images)
new.save("123.jpg")

brydenfogelman · 2021-03-31T19:01:39Z

Hmm not sure I quite understand what you're asking for here ... do you just want some code to test the model?

Also, the first image linked is that the results from the model or the test code you pasted?

ozzyou · 2021-03-31T19:16:15Z

Hi, thanks for your reply. Yes, that would be great. The image is the result of the test code.

liuyvchi · 2021-05-26T14:15:42Z

Hi, thanks for your reply. Yes, that would be great. The image is the result of the test code.

Hi, does your issue been resolved? I meet the same issue as yours.

greeneggsandyaml · 2021-05-30T18:45:27Z

I also find that training diverges -- I get similar-looking results (i.e. terrible results) after 100 epochs.

In other words, we are saying that we cannot get the model to train properly with this code. @brydenfogelman have you been able to successfully train a model with this code?

greeneggsandyaml · 2021-06-03T00:16:02Z

Hello, I just wanted to follow up on this issue.

brydenfogelman · 2021-06-03T02:36:28Z

I also find that training diverges -- I get similar-looking results (i.e. terrible results) after 100 epochs.

In other words, we are saying that we cannot get the model to train properly with this code. @brydenfogelman have you been able to successfully train a model with this code?

Hi! I was able to successfully train the model ... the resulting image in the README was from this model. I can try rerunning the model and seeing if I can replicate the issue you all are having.

I may have also introduced a bug in 2fdd396 by switching the LR scheduler to match the paper. I'll test this over the weekend.

In the meantime, @greeneggsandyaml you could try reverting the model back to the Exponential LR scheduler and see if that works?

greeneggsandyaml · 2021-06-03T02:52:34Z

Hi, thanks for the response! Yes, I will try with the exponential scheduler and report back results here when the experiment is complete.

greeneggsandyaml · 2021-06-04T12:54:07Z

As promised, here is an update. I ran the code from commit 603787ddebde0e19ff9419e6a4e4311ce362956d and everything worked well!

For those who are interested, here is my Weights and Biases log: https://wandb.ai/lukemelas2/public-experiments/runs/td2j9zcn?workspace=user-

Overall, this is great to see. I'll be doing more investigation into this as well.

greeneggsandyaml · 2021-06-08T13:59:53Z

Hello, I'm back with another update. Also, @brydenfogelman, did you manage to run the code again?

I'm finding that sometimes I get results that look good:

and sometimes I get results that look bad:

Have you seen these sorts of "splotchy" results before? Is it just due random initialization? It feels to me like it is too much variation to be caused solely by random initialization.

brydenfogelman · 2021-06-09T19:59:07Z

@greeneggsandyaml How long did you train for?

I think even the original authors found that results can vary on the network and slot initialization. I think this figure from the paper demonstrates this finding.

(Looking at this caption again I also realized that I didn't increase the number of slot iteration at test time, increasing this would probably make the results look better)

Here's an image of one of my earlier experiments where it did randomly learn to separate the background image.

They also trained their model for significantly longer than I trained it here (5 days wall clock time).

My best guess is that increasing the number of slot iterations at test time will improve the visualizations. What are your thoughts @greeneggsandyaml?

ZiwenZhuang · 2021-08-23T02:03:23Z

Hello, I'm back with another update. Also, @brydenfogelman, did you manage to run the code again?

I'm finding that sometimes I get results that look good:

and sometimes I get results that look bad:

Have you seen these sorts of "splotchy" results before? Is it just due random initialization? It feels to me like it is too much variation to be caused solely by random initialization.

Hi, How did you solve the problem?

I used the test code from above but get the gray-scale output.

I checked the images in wandb log files, which looks acceptable.

Is there anything wrong in the testing code above? How should I change it?

Thank you,

ZiwenZhuang · 2021-08-23T02:37:41Z

Hi, this is a follow-up, I know where the test code above might go wrong.

Depending on how you save your checkpoint file, the state_dict loaded might not match the model state_dict. Using strict = False might failed to load any weights into the model.

You should check the state_dict keys while loading the model.

:)

brydenfogelman · 2021-08-23T16:58:26Z

@ZiwenZhuang mentioned above is that the LR scheduler changed with ended up breaking subsequent runs. I'll try and push the fixes to the repo later today

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing code #3

Testing code #3

ozzyou commented Mar 30, 2021

ozzyou commented Mar 30, 2021

brydenfogelman commented Mar 31, 2021

ozzyou commented Mar 31, 2021

liuyvchi commented May 26, 2021

greeneggsandyaml commented May 30, 2021 •

edited

Loading

greeneggsandyaml commented Jun 3, 2021

brydenfogelman commented Jun 3, 2021

greeneggsandyaml commented Jun 3, 2021 via email •

edited

Loading

greeneggsandyaml commented Jun 4, 2021

greeneggsandyaml commented Jun 8, 2021

brydenfogelman commented Jun 9, 2021

ZiwenZhuang commented Aug 23, 2021

ZiwenZhuang commented Aug 23, 2021

brydenfogelman commented Aug 23, 2021

Testing code #3

Testing code #3

Comments

ozzyou commented Mar 30, 2021

ozzyou commented Mar 30, 2021

brydenfogelman commented Mar 31, 2021

ozzyou commented Mar 31, 2021

liuyvchi commented May 26, 2021

greeneggsandyaml commented May 30, 2021 • edited Loading

greeneggsandyaml commented Jun 3, 2021

brydenfogelman commented Jun 3, 2021

greeneggsandyaml commented Jun 3, 2021 via email • edited Loading

greeneggsandyaml commented Jun 4, 2021

greeneggsandyaml commented Jun 8, 2021

brydenfogelman commented Jun 9, 2021

ZiwenZhuang commented Aug 23, 2021

ZiwenZhuang commented Aug 23, 2021

brydenfogelman commented Aug 23, 2021

greeneggsandyaml commented May 30, 2021 •

edited

Loading

greeneggsandyaml commented Jun 3, 2021 via email •

edited

Loading