Question About Dataset Deduplication and Evaluation Differences #24

wudi-ldd · 2024-11-16T06:29:41Z

Dear Author,

I noticed that your paper mentioned deduplication procedures. I directly downloaded the RSITMD and RSICD datasets without making any modifications to the test set. Then, I used the pre-trained weights you provided, loaded via the OpenCLIP library, and evaluated the results. However, I found that my evaluation results were consistently two to three points lower compared to those reported in the paper. For instance, when using the RS5M_ViT-B-32.pt weights on the RSITMD test set, the performance was approximately two points lower than what was given in your paper.

I also encountered similar issues when using the pre-trained weights from the RemoteCLIP paper.

I would appreciate any insights or guidance you could provide.

Thank you for your response.

zilunzhang · 2024-11-16T06:51:47Z

Dear Author,

I noticed that your paper mentioned deduplication procedures. I directly downloaded the RSITMD and RSICD datasets without making any modifications to the test set. Then, I used the pre-trained weights you provided, loaded via the OpenCLIP library, and evaluated the results. However, I found that my evaluation results were consistently two to three points lower compared to those reported in the paper. For instance, when using the RS5M_ViT-B-32.pt weights on the RSITMD test set, the performance was approximately two points lower than what was given in your paper.

I also encountered similar issues when using the pre-trained weights from the RemoteCLIP paper.

I would appreciate any insights or guidance you could provide.

Thank you for your response.

Hi,

There is no need for deduplication if you do not use the training set of RSICD/RSITMD/RET-2 to finetune the GeoRSCLIP model (aka GeoRSCLIP-FT).

If this happened to RemoteCLIP as well, I would suspect this is a systematic problem due to the software version. What is the python/cuda/torch version you used? A long time ago, I found the results inconsistent in a paper due to the new feature in python3.6 compared with python3.5 that the randomness in the dictionary is removed.

wudi-ldd · 2024-11-16T07:20:05Z

Dear Author,

My current setup includes CUDA version 12.4, Python 3.10.0, and PyTorch 2.4.1. I am attempting to fine-tune your RS5M_ViT-B-32.pt weights using the RSICD and RSITMD datasets. I designed the fine-tuning code using the OpenCLIP library, and when fine-tuning with the RS5M_ViT-B-32.pt weights on the unfiltered RSITMD dataset, the results were actually quite close to those reported in your paper.

However, when using the pretrained weights RS5M_ViT-B-32_RSITMD.pt, the average accuracy was only around 46%, as shown in Figure 2. I haven't been able to identify the reason for this discrepancy.

I would like to ask: if I were to fine-tune RS5M_ViT-B-32.pt on the RSITMD dataset, do I need to perform deduplication on the RSITMD test set first? If deduplication is required, could you kindly provide the deduplication list? I noticed that RemoteCLIP provided a deduplication list for the RSITMD training set, but not for the pretraining data or the RSITMD test set, which I found a bit confusing.

If there is a data leakage issue with the pretraining data, wouldn't it be more appropriate to remove duplicates from the test set instead? If duplicates are removed from the pretraining dataset, as I understand it, then there shouldn’t be any need to process the RSITMD dataset separately, correct?

Thank you for your time, and I look forward to your response.

zilunzhang · 2024-11-16T07:44:47Z

the results were actually quite close to those reported in your paper

Hi,

It is great to hear the results were quite close to those reported in our paper, thank you.

As for the deduplication problem, if you fine-tune the model in the RSITMD dataset (training set), and test in the RSITMD dataset (test set), there is no need to deduplicate. The same for RSICD when only one dataset is involved.

The deduplication is necessary only when you fine-tune on RSITMD (training set) and test on RSICD (test set) or vice versa. The RSICD and RSITMD have data leakage problems when used together. That's why RemoteCLIP and our work deduplicate the data when using RET-2 and RET-3.

Deduplication list: ChenDelong1999/RemoteCLIP#13 (comment)

wudi-ldd · 2024-11-16T07:49:07Z

Got it, thank you for your response. The discrepancies in results might be due to differences in my system configuration, but it is clear that your results are indeed reproducible. This is an excellent piece of work.

zilunzhang · 2024-11-16T07:53:19Z

Got it, thank you for your response. The discrepancies in results might be due to differences in my system configuration, but it is clear that your results are indeed reproducible. This is an excellent piece of work.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question About Dataset Deduplication and Evaluation Differences #24

Question About Dataset Deduplication and Evaluation Differences #24

wudi-ldd commented Nov 16, 2024

zilunzhang commented Nov 16, 2024

wudi-ldd commented Nov 16, 2024

zilunzhang commented Nov 16, 2024 •

edited

Loading

wudi-ldd commented Nov 16, 2024

zilunzhang commented Nov 16, 2024

Question About Dataset Deduplication and Evaluation Differences #24

Question About Dataset Deduplication and Evaluation Differences #24

Comments

wudi-ldd commented Nov 16, 2024

zilunzhang commented Nov 16, 2024

wudi-ldd commented Nov 16, 2024

zilunzhang commented Nov 16, 2024 • edited Loading

wudi-ldd commented Nov 16, 2024

zilunzhang commented Nov 16, 2024

zilunzhang commented Nov 16, 2024 •

edited

Loading