Why the code has the TODO: release this comment? #5

Fengxiang23 · 2024-07-15T15:29:43Z

Thanks for your awesome works.
BreadcrumbsARM/Finetuning /models_mamba.py 288 line.
There is a TODO in the code: release this comment. Is the code not yet finished?
Why can't the finetune test code be reproduced? Have you open-sourced all the finetune parts?

OliverRensu · 2024-07-16T23:18:50Z

Hi, we inhrent the TODO from Vim (https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py#L312), but it does not affect the training or inference. Do you install the correct package like causal-conv1d==1.1.2.post1 & mamba-ssm 1.1.1 from ~/Mamba/Vim/mamba-1p1p1?

OliverRensu · 2024-07-17T02:39:41Z

We can also provide the finetuning log (fine-tuning with the code in this repo without any modification) if it helps you double-check/verify your fine-tuning process.

Fengxiang23 · 2024-07-23T13:42:01Z

Thank you for your patience. After comparing the original code, I found that this to do list is indeed original.
But in the process of further checking the code, I found that ARM uses Deocder, and the Query in Decoder is a random variable constructed by self.ar_token = nn.Parameter(torch.zeros(1, 1, self.dec_embed_dim)).
I observed that ARM wants to use the value calculated by crossattention with this Query to further calculate the loss.
My question is, is this calculation method related to autoregressive training? How can we understand this implementation with the autoregressive training?
Please forgive me for still having such doubts after reading your article. I really hope to get your explanation.
Thanks again.

Fengxiang23 · 2024-07-24T08:23:29Z

In fact, count+=1 is used repeatedly in your Deocder part.
This is equivalent to using the previous query in each Deocder block. I don't understand whether such calculation will bring additional benefits?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the code has the TODO: release this comment? #5

Why the code has the TODO: release this comment? #5

Fengxiang23 commented Jul 15, 2024

OliverRensu commented Jul 16, 2024

OliverRensu commented Jul 17, 2024

Fengxiang23 commented Jul 23, 2024

Fengxiang23 commented Jul 24, 2024

Why the code has the TODO: release this comment? #5

Why the code has the TODO: release this comment? #5

Comments

Fengxiang23 commented Jul 15, 2024

OliverRensu commented Jul 16, 2024

OliverRensu commented Jul 17, 2024

Fengxiang23 commented Jul 23, 2024

Fengxiang23 commented Jul 24, 2024