Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the code has the TODO: release this comment? #5

Open
Fengxiang23 opened this issue Jul 15, 2024 · 4 comments
Open

Why the code has the TODO: release this comment? #5

Fengxiang23 opened this issue Jul 15, 2024 · 4 comments

Comments

@Fengxiang23
Copy link

Thanks for your awesome works.
BreadcrumbsARM/Finetuning /models_mamba.py 288 line.
There is a TODO in the code: release this comment. Is the code not yet finished?
Why can't the finetune test code be reproduced? Have you open-sourced all the finetune parts?
image

@OliverRensu
Copy link
Owner

Hi, we inhrent the TODO from Vim (https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py#L312), but it does not affect the training or inference. Do you install the correct package like causal-conv1d==1.1.2.post1 & mamba-ssm 1.1.1 from ~/Mamba/Vim/mamba-1p1p1?

@OliverRensu
Copy link
Owner

We can also provide the finetuning log (fine-tuning with the code in this repo without any modification) if it helps you double-check/verify your fine-tuning process.

@Fengxiang23
Copy link
Author

Thank you for your patience. After comparing the original code, I found that this to do list is indeed original.
But in the process of further checking the code, I found that ARM uses Deocder, and the Query in Decoder is a random variable constructed by self.ar_token = nn.Parameter(torch.zeros(1, 1, self.dec_embed_dim)).
I observed that ARM wants to use the value calculated by crossattention with this Query to further calculate the loss.
My question is, is this calculation method related to autoregressive training? How can we understand this implementation with the autoregressive training?
Please forgive me for still having such doubts after reading your article. I really hope to get your explanation.
Thanks again.

@Fengxiang23
Copy link
Author

In fact, count+=1 is used repeatedly in your Deocder part.
This is equivalent to using the previous query in each Deocder block. I don't understand whether such calculation will bring additional benefits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants