-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extract LoRA for arbitrary models #312
Comments
This is definitely interesting. Wonder whether gradient approaches, evolutionary algorithms or plain old linear algebra norms (Spectral, etc) and factorization would be ideal for solving this. A digression: I wonder if there is going to be a clear gradient path from the weights to the lora weights? I am not fluent enough in calculus but assume that factorization might not be differentiable. What could be an alternative operation in this case, where instead of propagating gradients backwards another signal can be used to reach a lora weight from non-lora weights. |
I believe the problem should be differentiable, however, we do not need to rely on gradient-based methods as you say. We should use whatever method for low-rank approximation that is available and most effective. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. |
Hmm yeah I would like this feature too. |
I implemented something in this direction using singular value decomposition (SVD). I call it LoRD for Low-Rank Decomposition |
It's straightforward, using low-rank approximation, low-rank matrix factorization.
Given a model and its fine-tuning, and a target rank$k$ , extract the "best" low-rank approximation to each difference in the model weights, and export as LoRA.
The parameter$k$ can be a constant or can be unique to each matrix, e.g. $k_i$ for $W_i \in \Theta$
To summarize, given$W' - W = \Delta W$ , find $\hat{A},\hat{B}$ each with $k$ rows such that the norm $||\hat{A}^T\hat{B} - \Delta W||_2$ is minimized.
The text was updated successfully, but these errors were encountered: