trans_x0_threshold=1.0 ？ #38

jiaweiguan · 2024-04-11T14:40:13Z

Hi!
When running the training code, I noticed that trans_score_loss is always 0. Because trans_x0_threshold is set to 1.0. What is the purpose of this setting?

se3_diffusion/experiments/train_se3_diffusion.py

Line 568 in 53359d7

trans_loss = (

trans_loss = (
             trans_score_loss * (batch['t'] > self._exp_conf.trans_x0_threshold) # 1.0
             + trans_x0_loss * (batch['t'] <= self._exp_conf.trans_x0_threshold)
)

The text was updated successfully, but these errors were encountered:

jasonkyuyim · 2024-04-12T17:25:52Z

Hi, trans_x0_loss is the loss we use, corresponding to the first equation in section 4.2 of the paper. Note L2 on the true "denoised" positions is equivalent to the score loss due to gaussianization. I think trans_x0_threshold is an artifact of some early experiments we were doing but can be removed now.

jiaweiguan · 2024-04-15T02:02:34Z

During the training process, I observed a phenomenon where the change in trans_x0_loss is related to the length of the protein, as shown in the graph. Do you know the reason behind this? Could you please explain it?

jasonkyuyim · 2024-04-16T02:15:30Z

Well this should make sense. RMSD is sensitive to the length of the protein. Bigger proteins will tend to have larger errors.

jiaweiguan · 2024-04-16T08:27:30Z

Thanks!
Is it necessary to eliminate this part of the impact? I am currently doubtful about at what stage to stop training the model, as the reference from the validation data is limited. Is there any approximate quantitative relationship between the dataset size and the number of training steps?

jasonkyuyim · 2024-04-17T00:31:43Z

I haven't thoroughly studied any scaling. My latest code release frame flow has better metrics that track designability. But other than that you'll have to run evaluations from time to time.

jiaweiguan · 2024-04-22T05:51:47Z

Thank you! I have also looked into the work of FrameFlow and its diversity, novelty, and designability. However, I have noticed that there seems to be a preference for spiral structures, which may be dataset-dependent. The selection of a generation model is quite challenging.

jasonkyuyim · 2024-04-23T12:47:30Z

Yes it's very dataset-dependent. That said, helical structures are the most prevalent structures in all the diffusion/flow models (including chroma and rfdiffusion) nowadays.

jiaweiguan · 2024-04-24T07:40:13Z

Thank you for your response. I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?

jiaweiguan · 2024-04-25T02:16:25Z

This is the model I trained, and sometimes the strand percent is close to 0, and sometimes it's close to 0.2. It gives me the feeling that it's not stable enough.

jasonkyuyim · 2024-04-30T14:10:02Z

I have another hypothesis that the helical structure is easier to learn, while the beta strand is more challenging. Is there any research that confirms this?

We only have empirical evidence found through other protein diffusion models like Chroma.

jiaweiguan · 2024-05-06T09:14:13Z

Emmmm...
I performed sampling tests using the "best_weights.pth" parameters and observed that when the sampling length (L) is particularly large, such as L=1024, the sampled results consistently exhibit helix structure. I'm unsure whether this issue is related to the dataset or limited generalization capability with respect to L.

gsakellion · 2024-07-31T12:26:26Z

@jiaweiguan , I am currently in the state of trying to run the model (not managing yet), mainly to see whether from the "paper_weights.pth", which were trained on sequences of up to 512 monomers, I can get structures of greater length.
The accuracy of any structure is inconsequential at this state. From my understanding, it is possible to get longer chains.
But is it the case? It seems like you managed, but are the "best_weights.pth" trained on the same sequence length (L.E. 512)?

jasonkyuyim · 2024-07-31T14:27:25Z

Since the model was only trained up to length 512, one would not expect the model to perform well on unseen lengths such as 1024. You would have to change how the model is trained, i.e. with relative encodings or crops, to get good samples at longer lengths.

jiaweiguan closed this as completed Apr 24, 2024

jiaweiguan reopened this Apr 24, 2024

jiaweiguan closed this as completed May 6, 2024

jiaweiguan reopened this May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trans_x0_threshold=1.0 ？ #38

trans_x0_threshold=1.0 ？ #38

jiaweiguan commented Apr 11, 2024

jasonkyuyim commented Apr 12, 2024

jiaweiguan commented Apr 15, 2024 •

edited

Loading

jasonkyuyim commented Apr 16, 2024

jiaweiguan commented Apr 16, 2024

jasonkyuyim commented Apr 17, 2024

jiaweiguan commented Apr 22, 2024

jasonkyuyim commented Apr 23, 2024

jiaweiguan commented Apr 24, 2024

jiaweiguan commented Apr 25, 2024

jasonkyuyim commented Apr 30, 2024

jiaweiguan commented May 6, 2024

gsakellion commented Jul 31, 2024

jasonkyuyim commented Jul 31, 2024

trans_x0_threshold=1.0 ？ #38

trans_x0_threshold=1.0 ？ #38

Comments

jiaweiguan commented Apr 11, 2024

jasonkyuyim commented Apr 12, 2024

jiaweiguan commented Apr 15, 2024 • edited Loading

jasonkyuyim commented Apr 16, 2024

jiaweiguan commented Apr 16, 2024

jasonkyuyim commented Apr 17, 2024

jiaweiguan commented Apr 22, 2024

jasonkyuyim commented Apr 23, 2024

jiaweiguan commented Apr 24, 2024

jiaweiguan commented Apr 25, 2024

jasonkyuyim commented Apr 30, 2024

jiaweiguan commented May 6, 2024

gsakellion commented Jul 31, 2024

jasonkyuyim commented Jul 31, 2024

jiaweiguan commented Apr 15, 2024 •

edited

Loading