Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extract LoRA for arbitrary models #312

Closed
jon-chuang opened this issue Apr 14, 2023 · 5 comments
Closed

feat: extract LoRA for arbitrary models #312

jon-chuang opened this issue Apr 14, 2023 · 5 comments

Comments

@jon-chuang
Copy link

jon-chuang commented Apr 14, 2023

It's straightforward, using low-rank approximation, low-rank matrix factorization.

Given a model and its fine-tuning, and a target rank $k$, extract the "best" low-rank approximation to each difference in the model weights, and export as LoRA.

The parameter $k$ can be a constant or can be unique to each matrix, e.g. $k_i$ for $W_i \in \Theta$

To summarize, given $W' - W = \Delta W$, find $\hat{A},\hat{B}$ each with $k$ rows such that the norm $||\hat{A}^T\hat{B} - \Delta W||_2$ is minimized.

@mekaneeky
Copy link

mekaneeky commented Apr 14, 2023

This is definitely interesting. Wonder whether gradient approaches, evolutionary algorithms or plain old linear algebra norms (Spectral, etc) and factorization would be ideal for solving this.

A digression: I wonder if there is going to be a clear gradient path from the weights to the lora weights? I am not fluent enough in calculus but assume that factorization might not be differentiable. What could be an alternative operation in this case, where instead of propagating gradients backwards another signal can be used to reach a lora weight from non-lora weights.

@jon-chuang
Copy link
Author

I believe the problem should be differentiable, however, we do not need to rely on gradient-based methods as you say.

We should use whatever method for low-rank approximation that is available and most effective.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

@johnwick123f
Copy link

Hmm yeah I would like this feature too.

@thomasgauthier
Copy link

I implemented something in this direction using singular value decomposition (SVD). I call it LoRD for Low-Rank Decomposition

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants