fix: Chronos inference in foundation ts arena #382
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Thank you for evaluating Chronos again. It's great to see it performing accurately on this benchmark as well.
We found some problems with the way inference is being done for Chronos:
NaN
padding was being applied to short time series which is not required and would slow down the model significantly.bfloat16
which results in loss of information and may lead to poor accuracy.This PR fixes these issues. The following table shows a comparison of Chronos (Large)'s performance before (taken from the original table in this repo) and after these fixes, and also reports the performance of other variants of Chronos. These experiments were performed on a
g5.4xlarge
instance, as in the original study.We observe:
Here's how the average MASE ranking plots look like before and after the fix:
After the fix, Chronos-Large achieves the best overall rank (center plot). Chronos-Base obtains the same overall ranking as TimesFM and TimeGPT (right plot).
For the fidelity of the study, we recommend that the authors update their results and discussions accordingly, ideally after an independent verification with the latest code change (see usage below). Thank you again for your effort!
Usage
python eval-chronos.py
to re-evaluate (only) Chronos.