-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot reproduce the classification result of TimesNet #494
Comments
Many thanks for your detailed reproduction and pointing out the problem of learning rate changing strategy. (1) As I stated in the previous issue, some datasets of UEA suffer from serious limited data problems. Thus, their performance can be unstable. For example, under my experimental environment (w/o learning rate changing strategy), the Handwriting accuracy will be 0.33647058823529413. Here is the training log for this task. (2) To clarify, I will public the training checkpoints under my experiments in two weeks. |
I have the same problem, I am not able to reproduce the results. How can we get past this when training our own models? I have a model that surpasses the TimesNet results that I have been able to reproduce, but not the ones in the paper. How can I be sure that my model is not trained in a suboptimal way leading to underestimated metrics? In general, why aren't the metrics computed over multiple runs, with the mean and standard deviation being the final reported values? |
Many thanks for your question and valuable discussion. I have uploaded the checkpoint files and training log here: https://cloud.tsinghua.edu.cn/d/caefcdb63eee4adfad86/ Here is the summary of our experiments classification.log :
(1) The inconsistency between Table 17 and Our Exp As stated in the previous issue #321 (comment) , our original experimental code is based on this repo: https://github.com/thuml/Flowformer . To make the open-sourced code easy to read, I spent two weeks reorganizing the code and unified five tasks in a shared code base, that is TSlib. During the code organization, I may lose some details, such as the learning rate strategy, which is fixed in this commit 1c7f843 (although I do remember that before I public this repo I have ensured all the performances could be reproduced). In my current experiments, the averaged accuracy can be reproduced (a little bit better than the original paper). The only failed task is EthanolConcentration (35.7 v.s. 31.94). I plan to try my original code base and compare the training differences in every detail. If I have some new results, I will update them here, which may take some time. (2) About the performance variance. I have tried multiple runs and reported the std in our paper, which is around 0.1% for the average performance. The small subsets will be affected by random seeds differently, resulting in a kind of self-stable final performance. To remove the high-variance tasks, I would suggest you omit EthanolConcentration, Handwriting and UWaveGestureLibrary and try some EEG datasets, which we have experimented with in this paper: https://arxiv.org/abs/2402.02475 . Sorry for the inconvenience. If you have any questions, please email me or propose an issue in the repo. |
Thank you for the thorough response! |
As mentioned in the issue, I add the following two lines from commit:1c7f843.
However, I am still not able to reproduce the result. Moreover, the average result dropped after making the amendment mentioned in the commit.
Here are my results using TimesNet:
Thank you for your help.
The text was updated successfully, but these errors were encountered: