Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting of variables for transfer learning #98

Open
einrone opened this issue Oct 24, 2024 · 6 comments
Open

Sorting of variables for transfer learning #98

einrone opened this issue Oct 24, 2024 · 6 comments

Comments

@einrone
Copy link

einrone commented Oct 24, 2024

There is currently an implementation of sorting variables alphabetically which is performed by default. However for the case of transfer learning, we ideally want to have the same variable sorting for the pre-trained model, when including a stretched grid which has to match the original order of the variable list (when performing transfer learning). Meaning, we have to re-sort the variable list to match the previous list so it makes sense for the pre-trained model.

I have already implemented this on our MetNo fork (more specifically in src/anemoi/datasets/data/misc.py in function _open_dataset) which is stable and has been tested. I was wondering if I could create a branch with latest version and do a pull-request and include this feature into anemoi-datasets, as more member states wish to do stretched grid and transfer learning.

@floriankrb
Copy link
Member

Do you mean this reordering functionnality https://anemoi-datasets.readthedocs.io/en/latest/using/selecting.html#reorder ?

In any case, you are very welcome to create a pull request, as this would clarify the topic.

@einrone
Copy link
Author

einrone commented Oct 29, 2024

It is something similar. However it needs a key called sort_vars which is provided in the anemoi-training config. However this is a boolean. If sorts_vars is True during a pretraning (no transfer learning) the variable set is alphabetically sorted. Meaning performing transfer learning with a pretrained checkpoint and sort vars=True (in pretraining and transfer learning), dataset1, dataset2, etc.. will have same variable sorting as the pretrained checkpoint (i.e the original dataset).

@floriankrb
Copy link
Member

Instead of an additional key word, would this

ds = open_dataset(
    dataset,
    reorder='sort',
)

work ? (assuming it was implemented)

@JesperDramsch
Copy link
Member

I have introduced a PR that checks for these errors in different areas of the training cycle. ecmwf/anemoi-training#120

In situations, where we can easily suggest a fix, the logger will write out a suggested "reorder" command with that values a model would expect. Happy to take feedback on this one, but it seems to be parallel to this issue.

@einrone
Copy link
Author

einrone commented Nov 4, 2024

Yes, something like that. Should I or do you want to implement this feature?

@einrone
Copy link
Author

einrone commented Nov 21, 2024

I made a draft PR, and the reason is I have created unit test which test a global (ERA5) and lam datasets, but these datasets are located on my local computer. I was wondering if i should upload or include a small LAM and global dataset somewhere to run the unit test in the CICD pipeline. The draft PR does not include unittest for now, but if desired I can add them.

Link:
#144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants