Question about DenseCLIP for Any Visual Backbone #47

needsee · 2023-11-30T03:02:03Z

No description provided.

needsee · 2023-11-30T03:20:01Z

Congradulations on your great work! @raoyongming
I had got some questions about any backbone experiments. I want to know more details about the any backbone experiments.
Could you provide the codes for any backbone experiment? That will help understand a lot. Thanks!

needsee · 2023-11-30T07:38:01Z

If I use Swintransformer-T as the image encoder，the output image feature is [B, 768, 16, 12]. Is the attention pooling layer used to map image features to the embedded space([B,512,16,12]), then calculate similarity with text features? Can I replace it with a linear layer?

raoyongming · 2023-12-01T08:57:12Z

Yes, we use a randomly initialized attention pooling layer to map the image features into the embedding space. It might be okay to use a simpler linear layer but we haven't tried it in our experiments

needsee · 2023-12-01T09:05:38Z

Yes, we use a randomly initialized attention pooling layer to map the image features into the embedding space. It might be okay to use a simpler linear layer but we haven't tried it in our experiments
Thanks for your reply. Could you please provide the codes of any backbone experiment? This is my email [email protected]. Thanks.

needsee · 2023-12-13T08:23:12Z

@raoyongming 您好，请问您在做 any visual backbone 实验时，有做ImageNet pre-trained vit 的实验吗？我尝试了一下使用ImageNet pre-trained vit进行实验，结果没有提升，请问您觉得是什么原因呢？

raoyongming · 2023-12-14T07:01:06Z

你好，我们只在论文里面report的ResNet和Swin上做过实验。

needsee changed the title ~~uestion about DenseCLIP for Any Visual Backbone~~ Question about DenseCLIP for Any Visual Backbone Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about DenseCLIP for Any Visual Backbone #47

Question about DenseCLIP for Any Visual Backbone #47

needsee commented Nov 30, 2023

needsee commented Nov 30, 2023

needsee commented Nov 30, 2023

raoyongming commented Dec 1, 2023

needsee commented Dec 1, 2023

needsee commented Dec 13, 2023

raoyongming commented Dec 14, 2023

Question about DenseCLIP for Any Visual Backbone #47

Question about DenseCLIP for Any Visual Backbone #47

Comments

needsee commented Nov 30, 2023

needsee commented Nov 30, 2023

needsee commented Nov 30, 2023

raoyongming commented Dec 1, 2023

needsee commented Dec 1, 2023

needsee commented Dec 13, 2023

raoyongming commented Dec 14, 2023