Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trans_x0_threshold=1.0 ? #38

Open
jiaweiguan opened this issue Apr 11, 2024 · 13 comments
Open

trans_x0_threshold=1.0 ? #38

jiaweiguan opened this issue Apr 11, 2024 · 13 comments

Comments

@jiaweiguan
Copy link

Hi!
When running the training code, I noticed that trans_score_loss is always 0. Because trans_x0_threshold is set to 1.0. What is the purpose of this setting?

trans_loss = (
             trans_score_loss * (batch['t'] > self._exp_conf.trans_x0_threshold) # 1.0
             + trans_x0_loss * (batch['t'] <= self._exp_conf.trans_x0_threshold)
)
@jasonkyuyim
Copy link
Owner

Hi, trans_x0_loss is the loss we use, corresponding to the first equation in section 4.2 of the paper. Note L2 on the true "denoised" positions is equivalent to the score loss due to gaussianization. I think trans_x0_threshold is an artifact of some early experiments we were doing but can be removed now.

@jiaweiguan
Copy link
Author

jiaweiguan commented Apr 15, 2024

During the training process, I observed a phenomenon where the change in trans_x0_loss is related to the length of the protein, as shown in the graph. Do you know the reason behind this? Could you please explain it?
image

@jasonkyuyim
Copy link
Owner

Well this should make sense. RMSD is sensitive to the length of the protein. Bigger proteins will tend to have larger errors.

@jiaweiguan
Copy link
Author

Thanks!
Is it necessary to eliminate this part of the impact? I am currently doubtful about at what stage to stop training the model, as the reference from the validation data is limited. Is there any approximate quantitative relationship between the dataset size and the number of training steps?

@jasonkyuyim
Copy link
Owner

I haven't thoroughly studied any scaling. My latest code release frame flow has better metrics that track designability. But other than that you'll have to run evaluations from time to time.

@jiaweiguan
Copy link
Author

Thank you! I have also looked into the work of FrameFlow and its diversity, novelty, and designability. However, I have noticed that there seems to be a preference for spiral structures, which may be dataset-dependent. The selection of a generation model is quite challenging.

@jasonkyuyim
Copy link
Owner

Yes it's very dataset-dependent. That said, helical structures are the most prevalent structures in all the diffusion/flow models (including chroma and rfdiffusion) nowadays.

@jiaweiguan
Copy link
Author

Thank you for your response. I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?

@jiaweiguan
Copy link
Author

This is the model I trained, and sometimes the strand percent is close to 0, and sometimes it's close to 0.2. It gives me the feeling that it's not stable enough.
image

@jasonkyuyim
Copy link
Owner

I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?

We only have empirical evidence found through other protein diffusion models like Chroma.

@jiaweiguan jiaweiguan reopened this May 6, 2024
@jiaweiguan
Copy link
Author

Emmmm...
I performed sampling tests using the "best_weights.pth" parameters and observed that when the sampling length (L) is particularly large, such as L=1024, the sampled results consistently exhibit helix structure. I'm unsure whether this issue is related to the dataset or limited generalization capability with respect to L.

@gsakellion
Copy link

@jiaweiguan , I am currently in the state of trying to run the model (not managing yet), mainly to see whether from the "paper_weights.pth", which were trained on sequences of up to 512 monomers, I can get structures of greater length.
The accuracy of any structure is inconsequential at this state. From my understanding, it is possible to get longer chains.
But is it the case? It seems like you managed, but are the "best_weights.pth" trained on the same sequence length (L.E. 512)?

@jasonkyuyim
Copy link
Owner

Since the model was only trained up to length 512, one would not expect the model to perform well on unseen lengths such as 1024. You would have to change how the model is trained, i.e. with relative encodings or crops, to get good samples at longer lengths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants