Discrepancy in Model Performance Reproduction and Pretrained Model Parameters #10

shouyezhe · 2024-04-28T06:33:13Z

Hello BarqueroGerman,

I'm working on replicating your model's performance but noticed a gap between my results and the pretrained model's performance. I've confirmed that my hyperparameters match the ones in your Readme. Could you share the pretrained model's hyperparameters to help me troubleshoot? The performence of my trained model is shown.

Thanks

BarqueroGerman · 2024-04-28T08:48:31Z

Hi! Thank you for reporting the issue. We noticed as well a few differences during evaluation after refactoring and cleaning the code for its release. We will work on this and fix the issue soon. Thanks for your patience!

…

On 28 Apr 2024 at 07:33 +0100, shouyezhe ***@***.***>, wrote: Hello BarqueroGerman, I'm working on replicating your model's performance but noticed a gap between my results and the pretrained model's performance. I've confirmed that my hyperparameters match the ones in your Readme. Could you share the pretrained model's hyperparameters to help me troubleshoot? The performence of my trained model is shown. image.png (view on web) image.png (view on web) Thanks — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

shouyezhe · 2024-05-09T07:11:54Z

Hi! Thank you for reporting the issue. We noticed as well a few differences during evaluation after refactoring and cleaning the code for its release. We will work on this and fix the issue soon. Thanks for your patience!
…
On 28 Apr 2024 at 07:33 +0100, shouyezhe @.>, wrote: Hello BarqueroGerman, I'm working on replicating your model's performance but noticed a gap between my results and the pretrained model's performance. I've confirmed that my hyperparameters match the ones in your Readme. Could you share the pretrained model's hyperparameters to help me troubleshoot? The performence of my trained model is shown. image.png (view on web) image.png (view on web) Thanks — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

Hi! Thank you for the tremendous effort you've put into this work. I'm interested in the current progress of the code deployment differences you're working on. Could you share some details? Best wishes!

BarqueroGerman · 2024-05-13T09:37:32Z

Hi!

Sorry for the delay and thank you for your patience! I just fixed an error in the evaluation command (--bpe_denoising_step). Please, check if this resolved your problem.

In any case, I will double-check the training loop, as your deviation looks higher than the one you should be observing.

shouyezhe · 2024-05-15T07:02:45Z

Hi!

Sorry for the delay and thank you for your patience! I just fixed an error in the evaluation command (--bpe_denoising_step). Please, check if this resolved your problem.

In any case, I will double-check the training loop, as your deviation looks higher than the one you should be observing.

Hi!

Thank you for your efforts in investigating the issue with the evaluation parameters. However, even after the correction, the evaluation results are still significantly lower than those reported for the Pretrained models (using the same evaluation code). Notably, the evaluation metrics for the Pretrained models are very close to those reported in the paper. This leads me to believe that the issue might be related to the training parameters or the code provided in the repository.

I was wondering if you have attempted to train the models using the current version of the code in the repository. I will continue to investigate further to gather more information to assist in troubleshooting.

BarqueroGerman · 2024-05-17T09:24:12Z

Hi shouyezhe!

I am actively looking into this. I'll let you know my findings asap. Thanks for your patience!

BarqueroGerman · 2024-05-22T08:49:10Z

Hi again,

I re-trained both models several times with different seeds and I noticed two things:

Two parameters were missing in the Babel training command (--min_seq_len 45 --max_seq_len 250). I will update this in the readme now.
The randomness inherent in the BPE-training strategy and likely the random initialization result in evaluations yielding slightly different outcomes compared to those reported in the paper. In all the trainings I replicated, a similar performance (slightly higher/lower depending on the metric) to that reported in the paper was achieved around the 1.3M/500k for Babel/HumanML +/- 100k steps. For example:

Also, I feel that current metric-learning-based metrics used to evaluate generative motion synthesis are not as robust as we'd like. I am curious to see what researchers in this field come up with as an alternative. We tried our best proposing the PJ/AUJ metrics to make sure transitions are smooth enough, relying only on the motion original space.

BarqueroGerman closed this as completed Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy in Model Performance Reproduction and Pretrained Model Parameters #10

Discrepancy in Model Performance Reproduction and Pretrained Model Parameters #10

shouyezhe commented Apr 28, 2024

BarqueroGerman commented Apr 28, 2024 via email

shouyezhe commented May 9, 2024

BarqueroGerman commented May 13, 2024

shouyezhe commented May 15, 2024

BarqueroGerman commented May 17, 2024

BarqueroGerman commented May 22, 2024

Discrepancy in Model Performance Reproduction and Pretrained Model Parameters #10

Discrepancy in Model Performance Reproduction and Pretrained Model Parameters #10

Comments

shouyezhe commented Apr 28, 2024

BarqueroGerman commented Apr 28, 2024 via email

shouyezhe commented May 9, 2024

BarqueroGerman commented May 13, 2024

shouyezhe commented May 15, 2024

BarqueroGerman commented May 17, 2024

BarqueroGerman commented May 22, 2024