Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OOM] Fine tuning CLIP #1573

Open
AndrMoura opened this issue Jun 1, 2022 · 7 comments
Open

[OOM] Fine tuning CLIP #1573

AndrMoura opened this issue Jun 1, 2022 · 7 comments

Comments

@AndrMoura
Copy link

AndrMoura commented Jun 1, 2022

Hello, Im trying to fine-tune a CLIP model with my own data (image-description pairs) with a GPU but mid-training Im getting OOM RAM. The RAM memory during training slowly goes up and up until oom.

This is a sample from my code:

model = SentenceTransformer("sentence-transformers/clip-ViT-B-32")

train_examples = [InputExample(texts=[Image.open(os.path.join(img_path, row[1]['img_name'])), row[1]['description']])  for row in train_captions.iterrows()]

train_dataloader = DataLoader(train_examples, 
                              shuffle=True, 
                              batch_size=16)

train_loss = losses.MultipleNegativesRankingLoss(model=model)
# setup evaluator
...

model.fit([(train_dataloader, train_loss)], 
            show_progress_bar=True,
            epochs=10,
            output_path=output_path)

I believe the problem lies on the 2nd line. When I change the train_examples to load text only:
train_examples = [InputExample(texts=[row[1]['description'], row[1]['description']]) for row in train_captions.iterrows()]

The model trains without any memory issues!! I must be doing something wrong with the image loading. What is the proper way to load images in train_examples variable?

Thank you.

@jpzhangvincent
Copy link

I'm also looking for examples for finetuning the CLIP model with sentence-transformer. Thanks!

@rhkenne
Copy link

rhkenne commented Aug 3, 2022

Hey @nreimers, congrats on your move/promotion to cohere.ai. I would like to open a PR and address this issue. Any pointers on how to approach it ?

@yash-120304
Copy link

yash-120304 commented Jan 19, 2023

@AndrMoura did u solve ur problem?
can u tell me why did u not put label in ur InputExample

@AndrMoura
Copy link
Author

@AndrMoura did u solve ur problem? can u tell me why did u not put label in ur InputExample

I didn't. I used the HF library to train my own CLIP.

As for the label, check the MultipleNegativesRankingLoss .

@yash-120304
Copy link

yash-120304 commented Jan 19, 2023

image= mapping.keys()
captions=mapping.values()
train_samples=[]
for img,caps in tqdm(zip(image,captions)):
img=img+'.jpg'
img=Image.open(os.path.join(img_dir,img))
image_emb = clip.encode([img], convert_to_tensor=True, show_progress_bar=False)
for cap in caps:
cap_emb = clip.encode([cap], convert_to_tensor=True, show_progress_bar=False)
score = util.semantic_search(image_emb,cap_emb)
input_example=InputExample(texts=[img,cap],label=score)
train_samples.append(input_example)

This is how i am computing my score is it correct?

@yash-120304
Copy link

HF library to train my own CLIP

Ohh how can I do that?

bcoz for now i am simply using sentenceTransformers library to import my clip model and am getting good results but i cant evaluate it and this is where i am stuck.

@httplups
Copy link

httplups commented Dec 3, 2024

HI, I am having the same error with memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants