Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question About Dataset Deduplication and Evaluation Differences #24

Open
wudi-ldd opened this issue Nov 16, 2024 · 5 comments
Open

Question About Dataset Deduplication and Evaluation Differences #24

wudi-ldd opened this issue Nov 16, 2024 · 5 comments

Comments

@wudi-ldd
Copy link

Dear Author,

I noticed that your paper mentioned deduplication procedures. I directly downloaded the RSITMD and RSICD datasets without making any modifications to the test set. Then, I used the pre-trained weights you provided, loaded via the OpenCLIP library, and evaluated the results. However, I found that my evaluation results were consistently two to three points lower compared to those reported in the paper. For instance, when using the RS5M_ViT-B-32.pt weights on the RSITMD test set, the performance was approximately two points lower than what was given in your paper.

I also encountered similar issues when using the pre-trained weights from the RemoteCLIP paper.

I would appreciate any insights or guidance you could provide.

Thank you for your response.
2d4737a7046a5774edb71743ee205bf
4560e7c83cfe253dc78fcf46918e416

@zilunzhang
Copy link
Collaborator

Dear Author,

I noticed that your paper mentioned deduplication procedures. I directly downloaded the RSITMD and RSICD datasets without making any modifications to the test set. Then, I used the pre-trained weights you provided, loaded via the OpenCLIP library, and evaluated the results. However, I found that my evaluation results were consistently two to three points lower compared to those reported in the paper. For instance, when using the RS5M_ViT-B-32.pt weights on the RSITMD test set, the performance was approximately two points lower than what was given in your paper.

I also encountered similar issues when using the pre-trained weights from the RemoteCLIP paper.

I would appreciate any insights or guidance you could provide.

Thank you for your response. 2d4737a7046a5774edb71743ee205bf 4560e7c83cfe253dc78fcf46918e416

Hi,

There is no need for deduplication if you do not use the training set of RSICD/RSITMD/RET-2 to finetune the GeoRSCLIP model (aka GeoRSCLIP-FT).

If this happened to RemoteCLIP as well, I would suspect this is a systematic problem due to the software version. What is the python/cuda/torch version you used? A long time ago, I found the results inconsistent in a paper due to the new feature in python3.6 compared with python3.5 that the randomness in the dictionary is removed.

@wudi-ldd
Copy link
Author

Dear Author,

My current setup includes CUDA version 12.4, Python 3.10.0, and PyTorch 2.4.1. I am attempting to fine-tune your RS5M_ViT-B-32.pt weights using the RSICD and RSITMD datasets. I designed the fine-tuning code using the OpenCLIP library, and when fine-tuning with the RS5M_ViT-B-32.pt weights on the unfiltered RSITMD dataset, the results were actually quite close to those reported in your paper.

However, when using the pretrained weights RS5M_ViT-B-32_RSITMD.pt, the average accuracy was only around 46%, as shown in Figure 2. I haven't been able to identify the reason for this discrepancy.

I would like to ask: if I were to fine-tune RS5M_ViT-B-32.pt on the RSITMD dataset, do I need to perform deduplication on the RSITMD test set first? If deduplication is required, could you kindly provide the deduplication list? I noticed that RemoteCLIP provided a deduplication list for the RSITMD training set, but not for the pretraining data or the RSITMD test set, which I found a bit confusing.

If there is a data leakage issue with the pretraining data, wouldn't it be more appropriate to remove duplicates from the test set instead? If duplicates are removed from the pretraining dataset, as I understand it, then there shouldn’t be any need to process the RSITMD dataset separately, correct?

Thank you for your time, and I look forward to your response.
微信图片_20241116151347
微信图片_20241116151351

@zilunzhang
Copy link
Collaborator

zilunzhang commented Nov 16, 2024

the results were actually quite close to those reported in your paper

Hi,

It is great to hear the results were quite close to those reported in our paper, thank you.

As for the deduplication problem, if you fine-tune the model in the RSITMD dataset (training set), and test in the RSITMD dataset (test set), there is no need to deduplicate. The same for RSICD when only one dataset is involved.

The deduplication is necessary only when you fine-tune on RSITMD (training set) and test on RSICD (test set) or vice versa. The RSICD and RSITMD have data leakage problems when used together. That's why RemoteCLIP and our work deduplicate the data when using RET-2 and RET-3.

Deduplication list: ChenDelong1999/RemoteCLIP#13 (comment)

@wudi-ldd
Copy link
Author

Got it, thank you for your response. The discrepancies in results might be due to differences in my system configuration, but it is clear that your results are indeed reproducible. This is an excellent piece of work.

@zilunzhang
Copy link
Collaborator

Got it, thank you for your response. The discrepancies in results might be due to differences in my system configuration, but it is clear that your results are indeed reproducible. This is an excellent piece of work.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants