-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question About Dataset Deduplication and Evaluation Differences #24
Comments
Hi, There is no need for deduplication if you do not use the training set of RSICD/RSITMD/RET-2 to finetune the GeoRSCLIP model (aka GeoRSCLIP-FT). If this happened to RemoteCLIP as well, I would suspect this is a systematic problem due to the software version. What is the python/cuda/torch version you used? A long time ago, I found the results inconsistent in a paper due to the new feature in python3.6 compared with python3.5 that the randomness in the dictionary is removed. |
Hi, It is great to hear the results were quite close to those reported in our paper, thank you. As for the deduplication problem, if you fine-tune the model in the RSITMD dataset (training set), and test in the RSITMD dataset (test set), there is no need to deduplicate. The same for RSICD when only one dataset is involved. The deduplication is necessary only when you fine-tune on RSITMD (training set) and test on RSICD (test set) or vice versa. The RSICD and RSITMD have data leakage problems when used together. That's why RemoteCLIP and our work deduplicate the data when using RET-2 and RET-3. Deduplication list: ChenDelong1999/RemoteCLIP#13 (comment) |
Got it, thank you for your response. The discrepancies in results might be due to differences in my system configuration, but it is clear that your results are indeed reproducible. This is an excellent piece of work. |
Thank you! |
Dear Author,
I noticed that your paper mentioned deduplication procedures. I directly downloaded the RSITMD and RSICD datasets without making any modifications to the test set. Then, I used the pre-trained weights you provided, loaded via the OpenCLIP library, and evaluated the results. However, I found that my evaluation results were consistently two to three points lower compared to those reported in the paper. For instance, when using the RS5M_ViT-B-32.pt weights on the RSITMD test set, the performance was approximately two points lower than what was given in your paper.
I also encountered similar issues when using the pre-trained weights from the RemoteCLIP paper.
I would appreciate any insights or guidance you could provide.
Thank you for your response.
The text was updated successfully, but these errors were encountered: