Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MM] speed up OPs using hf models (clip, ...) #199

Closed
drcege opened this issue Jan 26, 2024 · 1 comment · Fixed by #203 or #222
Closed

[MM] speed up OPs using hf models (clip, ...) #199

drcege opened this issue Jan 26, 2024 · 1 comment · Fixed by #203 or #222
Assignees
Labels
dj:multimodal issues/PRs about multimodal data processing enhancement New feature or request

Comments

@drcege
Copy link
Collaborator

drcege commented Jan 26, 2024

Currently, when set np=28, clip of vit-base-p32 takes over 1h to compute similarities for 558k dataset, and tens of hours for vit-large-p14-336.

Image

Image

Image

Perhaps the following can help:

  1. loading on GPU (implemented)
  2. using batched computing (not easy to implement, as batching is closely related to the internal logic of operators)
@drcege drcege self-assigned this Jan 26, 2024
@drcege drcege added this to the Basic Multimodal Support milestone Jan 26, 2024
@drcege drcege added enhancement New feature or request dj:multimodal issues/PRs about multimodal data processing labels Jan 26, 2024
@drcege drcege linked a pull request Jan 30, 2024 that will close this issue
Copy link

This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day.

@drcege drcege linked a pull request Feb 29, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dj:multimodal issues/PRs about multimodal data processing enhancement New feature or request
Projects
None yet
2 participants