Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please do not use BiomedCLIP for ARCH dataset #12

Open
jinxixiang opened this issue Nov 11, 2023 · 2 comments
Open

please do not use BiomedCLIP for ARCH dataset #12

jinxixiang opened this issue Nov 11, 2023 · 2 comments

Comments

@jinxixiang
Copy link

Dear Author,

The ARCH dataset is divided into two subsets: the books_set and the pubmed_set.

I have noticed that the pubmed_set appears to overlap with BioMedCLip, which sources from PubMed Central.

In your paper, you combined these two datasets for cross-modality retrieval. However, I decided to separate them and compare their performance individually.

The retrieval performance on the pubmed_set was as follows:
{15.7; 79.8; 94.4; 16.7; 78.9; 93.7}

Meanwhile, the retrieval performance on the books_set was:
{7.3; 49.2; 74.2; 8.2; 49.7; 73.2}

In contrast, the performance of QUILT-GPT/77 showed different results:

The retrieval performance on the pubmed_set was:
{1.8; 23.6; 46.0; 1.6; 23.4; 45.7}

The retrieval performance on the books_set was:
{1.8; 27.7; 52.8; 1.5; 23.4; 46.4}

From these results, it's clear that there isn't as significant a domain gap between the two datasets as there is with BiomedCLIP.

@jinxixiang
Copy link
Author

from left to right, the figures represent: text2image R1 R50 R200, image2text R1 R50 R200

@wisdomikezogwo
Copy link
Owner

This is very valid, and points to some form of leakage that is expected on BiomedClip, Thank you for the evaluations, will make sure to add a note to the readme in future updates!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants